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Abstract. We study fault-tolerant distributed optimization of a sum of convex (cost) functions with 
real-valued scalar input/output in the presence of crash faults or Byzantine faults. In particular, the 
goal is to optimize a global cost function i where V = {1,..., n} is the collection of agents, 

and hi{x) is agent i’s local cost fnnction, which is initially known only to agent i. This problem finds 
its applications in the domain of fault-tolerant large scale distributed machine learning, where data 
are generated at different locations and some data may be lost dnring processing or be tampered by 
malicious local data managers. The global cost function captures the requirement that, 

in distributed machine learning, the system tries to take full advantage of all the data generated at 
different locations. Since the above global cost fnnction cannot be optimized exactly in presence of crash 
faults or Byzantine faults, we define two weaker versions of the problem for crash faults and Byzantine 
faults, respectively. 

When some agents may crash, the local functions/data stored at these agents may not always available 
to the system. In this scenario, the goal for the weaker problem is to generate an output that is an 
optimum of a function formed as 


C ( ^ hi (a:) + ^ mhi {x) j , 

\ieAr i<^T ) 

where M is the set of non-faulty agents, T is the set of faulty agents (crashed agents), 0 < Qi < 1 
for each i £ T and C is a normalization constant such that C (lA/”! -I- = 1- We present an 

iterative algorithm in which each agent only needs to perform local computation, and send one message 
per iteration. 

When some agents may be Byzantine, the system cannot take full advantage of the data kept by non- 
faulty agents. The goal for the associated weaker problem is to generate an output that is an optimum 
of a function formed as 

aihi{x), 

such that Oi > 0 for each i £ M and X^igAT present an iterative algorithm, where only local 

computation is needed and only one message per agent is sent in each iteration, that ensures that at 
least lA/”! — / agents have weights (oi’s) that are lower bounded by 2 {\f}\-f) ■ 

The obtained results can be generalized to asynchronous systems as well. 


This research is supported in part by National Science Foundation awards NSF 1329681 and 1421918. Any opinions, 
findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect 
the views of the funding agencies or the U.S. government. 
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1 System Model and Problem Formulation 

The system under consideration is synchronous, and consists of n agents connected by a complete 
communication network. Our results can be generalized to asynchronous system. We postpone the 
discussion of this generalization to the end of this report. The set of agents is V = {1, • • • , n}. We 
assume that n > 3/ for reasons that will be clearer soon. We say that a function /i : M —>■ M is 
admissible if (i) /i(-) is convex, and continuously differentiable, (ii) the set argmina;gR h{x) containing 
the optima of h{-) is non-empty and compact (i.e., bounded and closed), (hi) the magnitude of 
the gradient is bounded by L, i.e., \h'{x)\ < L, Vx € M, and the derivative h'{-) is L-Lipschitz 
continuous. Each agent i G V is initially provided with an admissible local cost function hi : M —>■ M. 
Ideally, the system goal is to optimize the average of all the local functions, and have all the agents 
to reach agreement on the optimum x. In particular, each agent should output an identical value 
X G M that minimizes 

i^/i*(x). (I) 

i&V 

This problem finds its applications in the domain of large scale distributed machine learning, where 
data are generated at different locations and the data center at each location is not allowed to 
transmit all the locally collected data to other centers either due to transmission capacity constraint 
or due to privacy issue. 

This problem is well-studied in the scenario where each agent is reliable throughout any execu¬ 
tion of an algorithm [7,15,21]. In this work, we consider the fault-tolerant version of this problem. 
In particular, we consider the setting where up to / of the n agents may crash or be Byzantine 
faulty. Let F denote the set of faulty agents, and let N = V — F denote the set of non-faulty 
agents. For each t > 0, let A7[t] be the collection of agents that have not been crashed till the end 
of iteration t, with M[0] = V. Note that M[t + 1] C M[t] for t > 0, and that lim^^oo A/'[t] = M. The 
set F of faulty agents may be chosen by an adversary arbitrarily. Let | = (/>. Note that 4> < f and 

|A7| > n — f. The presence of crashed or Byzantine faulty agents makes it impossible to design an 
algorithm that solves (1) for all admissible local cost functions (this is shown formally in Part I of 
this report [19]). Therefore, for crash fault and Byzantine fault, respectively, we study two weaker 
versions of the problem, namely, Problem 1 and Problem 2 in Figure 1. Problem 1 is proposed in 
this report, and Problem 2 is hrst introduced in Part I of this report [19]. 

When some agents may crash, the local functions/data stored at these agents may only be 
partially visible or even invisible to the system - as agent i may crash at any time during an 
execution. Problem 1 requires that the output x be an optimum of a function formed as 



where Af is the set of non-faulty agents, F is the set of faulty agents (crashed agents), 0 < Oj < 1 
for each i ^ F and C is a normalization constant such that C (jW| -|- ~ 

When some agents may be Byzantine, the system cannot take full advantage of the data kept 
by non-faulty agents. In addition, among the non-faulty agents, the system may put more weights 
to some agents than the others. Then, the desired goal is to maximize the number of weights (cti’s) 
that are bounded away from zero. With this in mind. Problem 2 in Figure 1 is introduced in Part 
I of this report [19]. In Problem 2, note that l{ai > fd} is an indicator function that outputs 1 
if ai > /3, and 0 otherwise. Essentially, Problem 2 requires that at least 7 weights must exceed a 
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threshold /?, where /3 > 0. Thus, (3 ,7 are parameters of Problem 2, capturing how the data collected 
by non-faulty agents are utilized by the system. 


Problem 1 


Problem 2 with parameters /3, 7 , /3 > 0 


X £ are min 


such that 


C 


iSN' 


I + ^ aihi{x) 


'ii £ J-, 0 < Oi < 1 and 



X £ are min 


such that 


aihi{x) 

iej\f 


Vi £ Af, Oi > 0, 

Oi = 1, and 

iSAT 

^ l(ai > /3) > 7 

iSAT 


Fig. 1: Problem formulations: All non-faulty agents must output an identical value x € M that 
satisfies the constraints specified in each problem formulation. 


We will say that Problem 1 or 2 is solvable if there exists an algorithm that will find a solu¬ 
tion for the problem (satisfying all its constraints) for all admissible local cost functions, and all 
possible behaviors of faulty agents. Our problem formulations require that all non-faulty agents 
output asymptotically identical 5; € M, while satisfying the constraints imposed by the problem (as 
listed in Figure 1). Thus, the traditional fault-tolerant consensus [9] problem, which also imposes 
a similar agreement condition, is a special case of our optimization problem.^ Therefore, the lower 
bound of n > 3/ for Byzantine consensus [9] also applies to our problem. Hence we assume that 
n > 3/. 

We prove the following key results: 


— (Theorem 2) We provide a simple iterative algorithm that solves Problem 1. In each iteration 
of this algorithm, each agent only needs to perform local computation and send one message. 

— (Theorem 4) We present a simple iterative algorithm that solves Problem 2 with j3 < 2 (\fJ\-f) 
and 7 < |AA| — /. In each iteration of this algorithm, each agent only needs to perform local 
computation and send one message. 

In our proposed algorithms, the local estimates at all non-faulty agents are identical in the limit. 

The rest of the report is organized as follows. Related work is summarized in Section 2. Two 
algorithms are proposed in Section 3, wherein the hrst algorithm solves Problem 1 with two rounds 
of information exchange in each iteration. In contrast, the second algorithm solves Problem 1 with 
one message sent per agent in each iteration. Section 4 presents a simple iterative algorithm that 
solves Problem 2 with f3 = 7 = l-^l “ /• Similar to the second algorithm in Section 

3, this proposed algorithm only requires one message sent per agent in each iteration. Section 5 
discusses the generalization of the obtained results to asynchronous systems, and concludes the 
report. 


^ Interested readers are referred to Part I of this report [19] for formal proof. 







4 


2 Related Work 

Fault-tolerant consensus [16] is a special case of the optimization problem considered in this report. 
There is a significant body of work on fault-tolerant consensus, including [6,5,14,8,12,23,10]. The 
optimization algorithms presented in this report use Byzantine consensus as a component. 

Convex optimization, including distributed convex optimization, also has a long history [3]. 
However, we are not aware of prior work that obtains the results presented in this report except 
[19,20]. Primal and dual decomposition methods that led themselves naturally to a distributed 
paradigm are well-known [4]. There has been significant research on a variant of distributed opti¬ 
mization problem [7,15,21], in which the global objective h{x) is a summation of n convex functions, 
i.e, h{x) = hj{x), with function hj{x) being known to the j-th agent. The need for robust¬ 

ness for distributed optimization problems has received some attentions recently [7,11,24,13,19,20]. 
In particular, Duchi et al. [7] studied the impact of random communication link faults on the 
convergence of distributed variant of dual averaging algorithm. Specifically, each realizable link 
fault pattern considered in [7] is assumed to admit a doubly-stochastic matrix which governs the 
evolution dynamics of local estimates of the optimum. 

We considered Byzantine fault in [19] and [20]. Both [19] and [20] considered synchronous 
system. [19] showed that at most jA/"] — / non-faulty functions can have non-zero weights. This 
observation led to the formulation of Problem 2 in Fig. 1. Six algorithms were proposed in [19]. In 
contrast, we also showed [20] that sufficient redundancy in the input functions (each input function 
is not exclusively kept by a single agent), it is possible to solve ( 1 ), where the summation is over 
all input functions. In addition, a simple low-complexity iterative algorithm was proposed in [20], 
and a tight topological condition for the existence of such iterative algorithms is identified. 

In other related work, significant attempts have been made to solve the problem of distributed 
hypothesis testing in the presence of Byzantine attacks [11,24,13], where Byzantine sensors may 
transmit fictitious observations aimed at confusing the decision maker to arrive at a judgment that 
is in contrast with the true underlying distribution. Consensus based variant of distributed event 
detection, where a centralized data fusion center does not exist, is considered in [11]. In contrast, 
in this paper, we focus on the Byzantine attacks on the multi-agent optimization problem. 

3 Mutil-Agent Optimization with Crash fault 

Algorithm 1 and its correctness proof contain the key ideas and intuition of this report. 

3.1 Algorithm 1: Two-Round of Information Exchange per Iteration 

In Algorithm 1, each agent j maintains two variables: the local estimate Xj and the auxiliary variable 
Sj, with Xj[t] and Sj[t] representing these two variables at the end of iteration t, and Xj[0] being the 
system input at agent j and Sj[0] = 0. In each iteration t > 1, there are two rounds of information 
exchange. In the first round, (1) agent j requests all the agents (including itself) to compute the 
gradients of their local functions at Xj[t — 1]; (2) after receiving Xj[t — 1], a non-faulty agent i 
computes /i'(xj[t — 1]) and sends it back to agent j] (3) agent j collects the requested gradients 
and updates the auxiliary variable Sj. In the second round, all the non-faulty agents exchange their 
auxiliary variables Sj[t]’s and update their local estimate as an average of all received auxiliary 
variables. 

Let {A[t]}jTg be a sequence of stepsizes chosen beforehand such that X[t] > 0 ad X[t] > A[t -|- 1] 
for each t > 0, limj^oo A[t] = 0, ~ ^ 
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Algorithm 1 for agent j at iteration t: 

Step 1 : Send Xj[t — 1] to all the agents (including agent j itself). 

Step 2: Upon receiving — 1] from agent compute—1]) - the gradient of function/ij(-) 
at Xi[t — 1] - and send it back to agent i. 


Step 3 : Let TZj [t — 1] denote the set of gradients of the form h[ {xj [t — 1] ) received as a result of 
step 1 and step 2. Update Sj as 


Sj[t] = Xj[t - 1] - 


X[t - 1] 




( 


(2) 


Step 4: Send Sj[t] to all the agents (including agent j itself). 

Step 5 : Let — 1] denote the set of auxiliary variables Si[t] received as a result of step 4. 
Update Xj as 

1 


Xj[t] = 






( 3 ) 


Steps 1, 2 and 3 correspond to the first round of information exchange, and step 4 corresponds 
to the second round of information exchange. We will show that Algorithm 1 correctly solves Prob¬ 
lem 1. Intuitively speaking, the first round of information exchange corresponds to the standard 
gradient-method iterate, which drives each local estimate to a global optimum; the second round of 
information exchange forces all local estimates at non-faulty agents to reach consensus. Algorithm 
2 will achieve a similar goal with a single round of exchange. 


Recall that Af is the set of non-faulty agents and is the set of faulty agents that may crash 
at any time during an execution. In Problem 1, the system goal is to optimize 


C f Y + Y ’ 

ViGAT ieT / 

where M is the set of non-faulty agents, T is the set of faulty agents (crashed agents), 0 < Oj < 1 
for each i € T and C is a normalization constant such that C (|AA| -|- ~ Given M and 

J-, the normalization constant C and the crashed agents’ coefficients Oj depend on when the faulty 
agents crash during an execution. For given Af and let C be the collection of potential system 
objectives, formally defined as follows: 


C = | p{x) : p(x) = C f ^/ij(x)-F ^aihi(x) j 

\i&Af i&T / 


0 < Oi < 1, C |AA| -F 


yA = 1 - } 

ieT / 


( 4 ) 
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Each p{x) € C is called a valid function. Since Vi € 0 < a* < 1, it holds that ^ < C < for 

each valid function. Note that YlieM hi{x) G C is a valid function. For ease of future reference, we 

let p{x) = i^j\fhi{x). Define Y = Up(a;)gcargmin p{x). The characterization of Y is presented 

in the following two lemmas. 

Lemma 1. Y is a convex set. 

Lemma 2 . Y is a closed set. 

The proofs of Lemma 1 and Lemma 2 are presented in Appendix A and Appendix B, respectively. 
In addition, since Y is convex, Dist {-^Y) is also convex. 


Asymptotic Consensus under Algorithm 1 We first show that asymptotic consensus among 
the non-faulty agents is achieved under Algorithm 1. The following proposition is used in proving 
consensus. 

Proposition 1. Let 0 < 6 < 1. Define £{t) = X]t=o limit of i{t) exists and 

lim i(t) = 0. 

t^OO 

Proposition 1 is proved in Appendix C. 


Recall that for each t > 0, A/'[t] is the collection of agents that have not been crashed till 
the end of iteration t. Note that J\f[t + 1] C A^[t] for t > 0, and that limi_j.oo A^[t] = A^- Denote 
M(t) = Xi[t] and m{t) = minje_yv[t] xfit]. 

Lemma 3. Under Algorithm 1, the sequence {M[t] — rn[t]}^Q converges and 

lim (M[t] — m[t]) = 0. 

>-oo 

Proof. Let i,j € A/'[t] such that Xi[f\ = M[f\, and Xj[t] = m[t]. 


M[f\ — m[f\ = Xi[f\ — Xj[f\ 

I A:e7^2[^-l] 




Y1 

P&nfit-i] 


by (3) 


= mm 


1 

^ \( 



/ 


^k[t] 


peTZfit-i] 


+ 


1 


1 


1 




— mm 


— mm 


1 






Y1 

pGTZfit-l] 


(5) 


Assume that — 1]| > — 1] • The case that \'R.‘f[t — 

similarly. 


< 


TZ^At-1] 


can be shown 
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We can simplify (5) as follows. 

/ 


M[t] — m[t] = 




\ 


\^fce77?[t-l] pG77j[t-l] y 






^ Sp[f]. 


pe77|[t-i] 

( 6 ) 


For each k, wet get 


»t[t] = xt[t - 1] - ,^J* '.'Mt-i]) 


< Xfclt — 1] + T— 77 -^ I L 1 since \h'u{x)\ < L, Vx € M, Vi G V 




(7) 


= Xk[t - 1] + X[t - 1]L < M[t - 1] + X[t - l]L. 

Similarly, for each k G A/'[t], it holds that 

Sk[t] > m[t — 1] — X[t — 1]L. (8) 

We bound the two terms in the right hand side of (6) separately. For the first term of (6) , we 
get 








(a) 

< 




/ 

Sk[t] — Sp[t] 

^fcg77f[t-l] pe77|[t-l] 

/ 

X] •sfcW + X] sfcW - spW - Y 

^^A;e77?[t-l]n77|[t-l] fcG'R.?[4-l]-77^[t-l] pG77|[t-l]n77?[t-l] pG7?.|[t-l]-77?[t-l] 

} 

— y~~] 'SpM 

[t- 1 ] -772 [i- 1 ] pe772 [t- 1 ] -772 [t- 1 ] 

yy {M[t — 1] + X[t — 1]L) — Y^ “ 7] “ X[t — 1]L) 

^A:G77?[t-l]-77^[t-l] pg77|[t-l]-77?[t-l] j 


\ 






(M[t - 1] + X[t - l]L) - 






(m[t — 1] — A[t — 1]L) 


( 9 ) 


Inequality (a) holds due to (7) and (8). 


























For the second term of (6), we get 






^ Sp[t] = - 

peM [4-1] 






|l^?[i-l]| 


< - 


|l^?[i-l]| 










sp[t] 

Y - 1 ] - 


{m[t — 1] — X[t — 1]L) . 

( 10 ) 


By (9) and (10), (6) can be further bounded as 
M[t] — m[t] = 


< 





(a) 




{m[t — 1] — X[t — 1]L) by (9) and (10) 


{M[t - 1] + X[t - 1]L) - 




{M[t — 1] — rn[t — 1] + 2X[t — 1]L) 


{m[t — 1] — A[i 






(b) f 

< -J— (M[t - 1] - m[t - 1] + 2A[t - 1]L) 


< 


n - / 
/ 


(M[0] -m[0]) + 2L Y 


4-1 . ^ X t-r 


^r =0 


n- f 


A[r] . 


n- f 

Equality (a) is true because that 

\n][t - 1 ] - nj[t - 1 ]| + \nj[t - 1 ]| - \n][t - 1 ]| 

= - 1 ]| - \mt -1] n nj[t -1]| + \nj[t -1]| - \n)[t - 1]| 

Since |J^| < /, it holds that — 1] — “1] < / and that — 1]| > n — f. Thus, 


( 11 ) 


7^2[^-l]-7^2[^-l] 




< 


f 


n- f 


-l]L) 

:p[t] by (6) 

-1]L) 

1 - 1 ]L) 
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and inequality (6) holds. 


It follows from Proposition 1 that 




t^oo \ ^^ \ n — f 

\,r=0 ^ ■' 


where b = Thus, taking limit sup on both sides of (11), we get 


ii“™p("[‘i -”1*1) ^ ((dy) <"i“i “”i"i)) + “<'4'S= (£ (dy ) ) = "■ 

On the other hand, by definition of {M[t] — m[t]), for each t > 0 we get {M[t] — m[t]) > 0. Thus 


t—r 


Then, we obtain 


liminf (M[t] — m[t]) > 0. 

>-OD 


limsup {M[t] — m[t]) < 0 < liminf [M[t\ — m[t]). 

t^oo 


Thus, the limit of {M\t] — m[t]) exists and 


lim {M[t] — m[t]) = 0. 

t—>-CO 


□ 


Recall that M[t] = Xi[t] and m[t] = minj^jy-j^] ■ Lemma 3 implies that asymptotic 

consensus is achieved under Algorithm 1. The following lemma is used in the correctness proof of 
Algorithm 1. 

Lemma 4. Under Algorithm 1, the following holds. 


X[f\ {M[t] — m[f\) < oo. 
t=o 


Proof. Since 


A[t] {M[t] - m[t]) = A[0] (M[0] - m[0]) + Y > 

t=o t=i 


and A[0] (M[0] — m[0]) < oo, to show Lemma 4, it is enough to show that 


OO 

Y^ (Tf[t] — m[t]) < oo. 
t=i 
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oo oo / ^ r \t / 1—1 / f \ t—r 

Y, Alt] (M|t] - m[t]) < A[t] ( ( ^ ) (Af|0] - ™|0]) + 2L ( ^ ( ^ ) A[r] | | by (11) 


t=l 


t=l 


yr=0 


n- f 


n \t OO t—1 / / f \ 

(M[0] — m[0]) +“EE((r^) wi*l 


t=l 

oo 


t=l r=0 


n- f 


(a) oo y ^ oo t-1 / / t \ 

<(M[01-m|01)2A[t|(^j +iEE(()7^) (A“H + A^[‘l) 

OO / f \ * °° / f ^ 

= (A/[0]-m[0|)2A[t](^) +Lj^A2[t]^' 


£=1 


t=l r=0 


n - / 


oo t-1 / , f \ t-r 

+ ^EE 11-^1 A^H 


t=l r=0 


n- f 


( 12 ) 


Inequality (o) holds because A[t]A[r] < AJll+AJli. We bound the three terms in the RHS of (12) 
separately. 

The first term of (12): Since A[t] < A[0] for each f > 1, we have 


(M[0]-m[0])£A[t] 


i=l 


/ 


n- f 


< 


(M[0]-m[0])A[0]£ 


< (M[0] - m[0]) A[0] 


t=l ^ ■' 

1 

n-f 


= (A/[0]-m[0])A[01;^-^<™. 


(13) 


The second term of (12): 


OO £ 1 / p \ f — 'Y' oo £ 


£=1 'r=0 


\n — f 

t=l r=l ^ ■' 



n- f 


n-2f ^ 
•' t=i 

< oo 


lEa'W 


(14) 


The last inequality follows from the fact that A^[t] < Z)So < °° 
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The third term of (12): For any fixed T, we get 


T t-l 

i=l r=0 


/ 


n- f 


t—r 


T-l 




T . f 


r=0 t= 

T-l oo 

srE^'HE 

r=0 1=0 


^ v^-/ 


/ 


n- f 


n - 2/ ^ ^ ^ 

■' r=0 


Let T oo, we get 


oo 1—1 

^EE 

1=1 r=0 


/ \ \2 


n - / 


A^H = 


n-f 
n-2f 


L^A^[r] < oo. 


(15) 


r=0 


We get 


^ A[t] (M[t] - m[t]) < (M[0] - m[0]) ^ A[t] 


1=1 


1=1 


n- f 


t-l 


1=1 r=0 


n - / 


t—r 


oo t-l 

+ iEE 


^ A^h) by (12) 


n-f 


t=l r=0 

<00 + 00 + 00 = 00 by (13), (14) and (15) 

proving the lemma. 

By Lemma 4, we know there exists some constant Ci snch that for any constant t > 0 

OO OO 

A[r]L (M[r] — m[T]) < A[r]L (M[r] — m[r]) < Ci. 

T=t r=0 

The following corollary is an immediate consequence of Lemma 4. 

Corollary 1. Under Algorithm 1, 


□ 


(16) 


and 


lim A[t] {M[f\ — m[t]) = 0, 

1—AOO 


lim A[r] {M[t] — m[r]) = 0. 

t—>-oo ^^ 


(17) 


(18) 


T=t 


Proof. By Lemma 4, (17) holds trivially. Now we prove (18). 

Let F = “ ”^['^])) be a sequence such that for each t, 


t-l 


-m\T]). 


T = 0 
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Since M[r] — m[T] >0 for each r > 0, by construction, it holds that Ft < Ft+i and that Ft < F 
for each t >0. Thus, by MCT, we know that 


lim Ft = F. 


t—^OO 


Now, let 

OO 

Rt = F - Ft = Y^ X[t]{M[t] - m[r]) - ^ X[t]{M[t] - m[r]) = ( ^ X[t]{M[t] - m[T]) ] . 


1-1 


T = 0 


T=0 


\T=t 


By Lemma 4, we know that F < oo. Thus the sequence Rt is well-defined. In addition, since the 
sequence Ft converges, then the sequence Rt also converges. So, we get 

lim I A[r](M[r] — m[r]) | = lim Rt = lim {F — Ft) = F — lim Ft = F — F = 0, 


^T=t 


proving (18). 


□ 


Optimality of Algorithm 1 

Definition 1. Given a sequence ® sequence of gradients {g[t]}f^Q, and a set of stepsizes 

{A[t]}“Q we say x[f\ is a resilient point with respect to gradient g[t] if one of the following items is 
true: 


* x[t] G Y and (x[t] — A[t]( 7 [t]) ^ Y, 

* x[t] > maxT and (x[t] — A[t]( 7 [t]) < minT, 

* x[t] < minT and (x[t] — A[t] 5 '[t]) > maxT. 


Since by Lemma 2, we know that set Y is closed. Thus maxT and minT exist, and Definition 1 is 
well-defined over set Y. 


Let be a sequence of estimates such that 

z[t] = Xj^[t], where jt G argmaXjg_;^[j]iAzst T). (19) 

From the definition, there is a sequence of agents {ji}“o associated with the sequence {z[t]}^Q. 
Lemma 5. If there exists c > 0 such that 

lim Dist (^[t], Y) = c, 

t^OO 

then at least one of the following two statements is true. 

(A.l) There exists a subsequence {z[tk]}^^Q such that z[tk] < minT for all k >0. 

(A.2) There exists a subsequence {z[t'j,]}'^Q such that z[t'j,] > maxT for all k >0. 

In addition, at least one of (minT — c) or (maxT -|- c) is an accumulation point of { 2 ;[t]}“Q. 

Proof. Since Dist {z[t],Y) = c > 0, there exists m such that z[f\ ^ Y for each t > m. 

Otherwise, there exists a subsequence {z[tk]}^^Q such that z[tk] G Y for each k > 0. By definition 
of Dist (•, Y), we have, Dist {z[tk], T) = 0 for each k > 0. Then 

c = lim Dist {z[tk],Y) = 0, 
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contradicting the assumption that c > 0. 

Since z\t] ^ Y for each t > m, at least one of the following two statements is true. 

(A.l) There exists a subsequence {z[tk]}’^Q such that z[tk] < minT for all k >0. 

(A.2) There exists a subsequence {z[t'j^]}’^Q such that z[t'f^] > maxT for all A: > 0. 

By symmetry, WLOG, assume (A.l) is true. Then, for each y (zY and each k >0, we have 

< minT < y. 


Thus, 


z[tk] - y\ = y - z[tk]. 


Minimizing over y £Y, we have 


Dist {z[tk],Y) = m.m\z[tk] - y\ = min(y - z[tk]) = minT - z[tk]. 
y&y y&Y 


Thus, 


z[tk] = minY — Dist {z[tk],Y). (20) 

Recall that the limit of Dist{z[t],Y) exists and \\m.t^ooDist{z\t\,Y) = c, and note that 
{Dist{z\tk]-,Y)}^^Q is a subsequence of {Dist{z\t],Y)'\^Q. Thus, the limit of Dist{z\tk\-,Y) ex¬ 
ists, and 

lim Dist {z[tk]-,Y) = lim Dist{z[t],Y) = c. 

k—>-oo i—J-oo 

Therefore, the limit of z[tk] exists, and 

lim z[tk] = lim (min Y — Dist (z[tk],Y)) 

k^oo k^oo 

= miny— lim Dist (z[tk],Y) 

k^oo 

= minT — c. ( 21 ) 

Thus, (minT — c) is an accumulation point of {z[t]}'^Q. 

Similarly, if (A.2) is true, i.e., there exists a subsequence {z[t'j^]}^Q such that z[t'i^] > maxT for 
all k >0, and we can show that (maxT -|- c) is an accumulation point of 

Therefore, Lemma 5 has been proved. 

□ 

Recall that {.s[t]})Tg is a sequence of estimates such that (19) holds, and that there is a sequence 
of agents associated with the sequence {.z[t]}“o- 

Lemma 6. If 

lim Dist {z[t], Y) = 0, 

t—¥oo 

then for each non-faulty agent i in M, the sequence {Dist (xi[t\,Y)'\f^Q converges and 

lim Dist {xi[t],Y) = 0. 


(22) 
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Proof. For each i G J\f, we have 


Dist {xi[f\,Y) < max Dist {xj[f\,Y) = Dist {z[f\,Y) by (19) 
jeAflt] 


Taking limit sup on both sides, we get 


limsupHist (xj[t],y) < limsup Dist (z[t], T) = 0 by ( 22 ) 

t^oo t—>-oo 


Thus, for each i G M, the sequence {Dist {xi[f\, y)}^g converges and 

lim Dist {xi[t],Y) = 0. 


□ 

Lemma 5 and Lemma 6 derived in proof of Algorithm 1 apply to Algorithm 2 and Algorithm 3 
also. 

Theorem 1. The sequence {Dist{z[t],Y)'\^Q converges and 

lim Dist {z[t],Y) = 0. 

t^OO 

Proof. Recall that J\f[t — 1] is the set of agents that do not crash by the end of iteration t — 1. 
There may exists an agent j that crashes during the execution of iteration t. If agent j crashes 
after performing step 3 in Algorithm 1, then Sj[t] is well-defined. On the contrary, if agent j crashes 
before step 3 is conducted, then Sj[t] is not well-defined. In this case, we define Dist {sj[f\,Y) = 0 
for ease of exposition. With this convention, minjg_/v-[j_i] Dist {sj[f\,Y) is well-defined. 

Let G J\f[t — 1] such that 


max ^Dist {sj[f\,Y) = Dist ^Sj/ . 


(23) 


We get 


Dist {z[t],Y) = max Dist {xj[t],Y) due to (19) 


( 


= max Dist 
jeATlt] 


< max 
jeATlt] 


1 




'^j[^ 1] i&TPAt-l] 


Y 


by (3) 


/ 




Dist{si[t], Y) since Dist (•, y) is convex 

i&'RA [ 1 - 1 ] 


< max max Dist (si\f\,Y) 
j&m ien][t-i] 

< max Dist (si\t],Y). 

= Dist (^Sj/_Jt],y^ 


= inf 
y&Y 




A[t-1] 


7^l [t-1] 

Jt-1 


\ 




i&n], [ 1 - 1 ] 


/ 


by (2). (24) 
















15 


Recall that j[ is defined as (23). Note that for each t > 0, there exists a non-faulty agent such that 
(24) holds, and there exists a sequence of agents ^ sequence of estimates 

such that 


x[t] = Xj>^[t]. 

Let {g[t]}’^Q be a sequence of gradients such that 

, / 


(25) 


aiA = 






yie7ei,[t] 


(26) 


If x[t — 1] = Xj' — 1] is a resilient point with respect to the gradient g[t — 1], by Definition 
1, we can bound (24) further as follows 


Dist {z[t], Y) < inf 

y&Y 




( \ 


x,-/_jt 1] 

X[t - 1] 


- y 





< X[t-l]L. 

(27) 


If Xj/ — 1] is not a resilient point with respect to the gradient g[t — 1], then from Definition 
1 , we know that 

Bl: if Xf [t - 1] € y, then Xf [t - 1] - (EiG'R.\ [t-i] “ 1])) ^ 

[t-i] V 4-1 / 

•^t -1 

B2: if Xj._Jt- 1] < miny, then Xj/_Jt- 1] - [i-i] ^ maxy, 


n], [t-i] 


B3: ifXj/_Jt-l] > maxy, then (j2i€TZ^., [t-i] “ 1])) >miny. 

7^^, [t—l] V h-i / 


We consider two scenarios: scenario 1 


®4_i[^-l] - 


X[t - 1] 




\ 




i&TZX [t 1 ] 

\ 


€ y, 




and scenario 2 


^4-1 1 ] - 


X[t - 1] 




\ 




[t 1 ] 




The first scenario can possibly appear in each of Rl, B2, and B3. In contrast, the second scenario 
can only appear in R2 and B3. 
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Scenario 1: Assume that 




\[t - 1 ] 




\ 


- 1 ]) 


i&n], [t-i] 

h-i 




/ 


it holds that 


inf 

y& 




\[t - 1 ] 




\ 




[t 1] 

\ h-i 


= 0 < Dist {z[t — 1],Y) . 


Thus, (24) can be further bounded as 


Dist {z[t],Y) < inf 

y& 


[^ - 1] - 


X[t - 1] 






- 1 ]) 


iGTZ], [i-1] 


/ 


= 0 

< Dist {z[t — 1], y) 
Scenario 2: Assume that 

( 


(28) 

(29) 




X[t - 1] 




\ 


Y h'i{xj>_^[t - 1]) 


iStY, [t 1] 

\ h-l 


^ Y = [min Y, max Y]. 




As commented earlier, either B2 holds or B3 holds. In addition, from the assumption of scenario 
2, B2 and B3 can be further refined as follows. 


B2': Xj^_^ [^ — 1] < min Y and Xj^_^ [t — 1] — 


Jt-i ^ 


Mt-i] 


B3'\ Xj/ Jt —1] > maxy and Xj' Jt—1] — 


n], [i-i] 


n], [t-i] 


(SigT^i, <miny 

(E^g 7 ^l, >maxy 


Suppose B2' is true. As x^/ Jt — 1] < miny, and-^-^Z^igT^i, [t-i] “ 1])^ is 


the gradient of a valid function at point x^' ^[t — 1 ], from the c 

/ 

h'i{xj,_^[t - 1]) 




efinition of set Y, we know that 

\ 

< 0. (30) 


i&n], [t-i] 

h-1 




In addition, since 




X[t - 1] 


n], [t-i] 

Jt-i 


\ 


Y h'i{xj,^_^[t-l]) 


i&TY., [ 1 - 1 ] 

. ■’t-i 


< miny. 


/ 
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it holds that for any y &Y 


Xf^Jt-1] - 


X[t - 1] 




\ 


[t 1 ] 

\ H-i 


= y- + 


X[t - 1] 


7^l, [t-1] 

Jt-1 


y - + 


X[t - 1] 






^ - 1 ]) 

) 

( \ 

Y1 

[^— 1 ] 

\ / 

t- 

( 


( 


i^nX [t-i] 

\ ^t-i 


\ 




\ 


^ - 1 ]) 


i&VX, [t-1] 

\ -^t-i 


/ 


by (30) (31) 


Similarly, we can show that (31) still holds for the case when B‘X> is true. Henceforth, we refer 
(31) as the relation holds for both B2' and B2> , i.e., holds under scenario 2. 


Thus, under scenario 2, we can bound (24) as follows 


Dist (^[t], Y) < inf 
y& 


= inf 
ydY 




A[t-1] 




\ 




- 1 ]) 

( 


\t 1 ] 

\ H-i 




by (24) 


^ h'(x,._Jt - 1]) 


= Dist[xji^ Jt — 1],— A[t — 1] 

< Dist {z[t — 1 ], y) — A[t — 1] 

< Dist {z[t — 1], y). 


i^TlX [t-i] 

\ h-i 

/ 


/ 


by (31) 






\ 


[t 1 ] 

\ H-i 




iGR\ [t 1 ] 

\ 


/ 


(32) 

(33) 


The last inequality follows from the fact that 


Dist ^Xji^ Jt —l],y^ < . ^Dist {xj[t — 1],Y) = Dist {z[t — 1],Y). 
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Combining the above analysis for the case when x^/ ^ [t — 1] is a resilient point or the case when 
Xj/ — 1] is not a resilient point, by (27), (29) and (33), we obtain the following iteration relation 

Dist{z[t],Y) <m.ayi{\[t — 1]L, Dist[z[t— !],¥)} . (34) 


Recall from (25) and (26) that x[t—1] = Xj' Jt—1] and g[t—l\ = 


n], [i-i] 




We consider two cases : case (i) there are inhnitely many points in {x[t]}^Q that are resilient with 
respect to there are finitely many points in {x[t]}J^Q that are resilient with 

respect to respectively. 


Case (i): There are infinitely many points in {x[t]}^g that are resilient with respect to {(7[t]}“o- 
Let {L}^o he the maximal sequence of such indices. Since x[L] is a resilient point with respect 


to g[ti] for each i, then for each ti, by (27), we get 

Dist {z[ti + 1],Y) < X[ti]L, (35) 

and for each t ^ ti for any i, by (29) and (33), we get 

Dist {z[t + 1], y) < Dist {z[t],Y) . (36) 

Taking limit sup on both sides of (35) over i, we get 

0 < limsup Dist{z[ti + l],y) < limsupA[L]L = 0. (37) 

2—>-00 2^00 

For each t > to and r ^ {L}^ 0 ’ there exists tii^^-^ such that tj(T-) < r < Then, we get 

(z[r], y) < Dist (^^[^(r) + due to (36) and that r > + 1 (38) 

< A[L(,)]L by (35) (39) 

Taking the limit sup on both sides of (38) over r, where t > to and r ^ 

limsup Dist ( 2 ;[r], y) < limsup A[L(t-)]L = lim A[tj(T-)]L = 0. (40) 

r^oo T—>-oo r^oo 


From (37), we know that Ve > 0, dig such that for all i > io, the following holds. 

sup{Dist {z[tj],Y ), tj € {L}“o> j > *o} = |sup{T)ist (^[tj], y), tj G j > io} - 0| < e. 

(41) 

From (40), we know that Ve > 0,3r*,r* ^ {Lj^g such that for all r > r*,r ^ {L}“g, the 
following holds. 

supjiAist ( 2 ;[r],y) = |sup{iAist (z[r], y) - 0| < e. (42) 

Let t* = max{Lp,T*}. Then for each e > 0 and t >t*, we have 
sup{Dist {z[t],Y ), t > t*} 

= sup {{Dist {z[t],Y), t G {L}^o,t > ti^} U [Dist {z[t],Y), t ^ {L}“o,t > r*}) 

= max{sup{Disf (z[t],y), t G {L}“g,t > ti^},sup{Dist {z[t],Y ), t ^ {L}“o,t > r*}} 

< max{e,e} = e by (41) and (42). 
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Thus, we have 

limsupT>ist ( 2 :[i], y) =0. 

>-oo 

Therefore, the limit of Dist {z[t],Y) exists, and 

lim Dist {z[t],Y) = 0. 

t—>-oo 

Case (ii): There are finitely many points in {ic[t]}^o that are resilient with respect to 

By the assumption of case (ii) we know that there exists a time index mg such that for all 
t > 'mo, each x[t] is not a resilient point with respect to g[t]. Then, for t > mo, (36) holds. Thus, 
by MCT, the limit of Dist {z[t], Y) exists. Let c > 0 be a nonnegative constant such that 

lim Dist {z[t],Y) = c. (43) 

t^OO 

Since Dist {z[t],Y) < Dist {z[mo],Y) holds for each t > mo, we know that c < oo. 


Case (ii.a): Assume that there are infinitely many time indices t > mo such that 

\ 


x[t] - A[t]5[t] = Xj>[t]- 


A[t] 






^ [t] 


G Y. 




Let he the maximal sequence of such time indices. By (28), we have 

Dist {z[tk + 1], y) <0. 

Thus, the limit of Dist {z[tk + l],y) exists and 

lim Dist {z[tk + 1], T) = 0. 

k^oo 

Recall from (43) that the limit of Dist {z[t],Y) exists. The limit of Dist {z[t],Y) and the limit 
of Dist{z[tk + l],y) should be identical, i.e.. 


c= lim Dist {z[t],Y) = lim Dist {z[tk + 1],Y) = 0, 

t—^oo k^ca 


(44) 


proving the theorem. 


Case (ii.b): Assume that there are only finitely many time indices t > mo such that 

/ 


x[t] - X[t]g[t] = xj^[t]- 


A[t] 




hiixj[[t]) I G y 


Then, there exists^ m' > mo such that for each t > m' > mo, x[t] is not a resilient point with 
respect to g[t], and x[t] — A[t] 5 '[t] ^ Y. Thus, for each t > m' > mo, (32) holds, i.e.. 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 


1 



i Y 


Recall that mo is the time index such that for each t > mo, x[t] is not a resilient point with respect to g[t]. 


2 
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Recall that 0 < c < oo is a nonnegative constant such that lim^^oo Dist {z[t],Y) = c. Next 
we show that c = 0. We prove this by contradiction. Suppose c > 0. By Lemma 5, we know that 
either (A.l) is true or (A.2) is true. 

(A.l) There exists a subsequence {z[tk]}’^Q such that z[tk] < minT for all k >0. 

(A.2) There exists a subsequence {z[t'f^]}’^Q such that z[t'f,] > maxT for all A: > 0. 

We also know that at least one of (minT — c) or (maxT + c) is an accumulation point of { 2 :[t]}tTQ, 
and no other accumulation points exist. 

Let a = mini", b = maxi" and e = |. It can be seen from the proof of Lemma 5 that there 
exists m such that z[t] ^ Y for each t>m. We consider three scenarios: (A.l) is true but (A.2) is 
not true, (A.2) is true but (A.l) is not true, both (A.l) and (A.2) are true. 


When (A.l) holds but (A.2) does not hold: That is, there exists a subsequence {z[tk]}'^Q 
such that z[tk] < mini" for all k > 0] and there does not exist a subsequence { 2 ;[ty}^Q such that 
z[t'j^] > maxi" for all k >0. Then there exists mi > m such that z[t] < mini" for each t > mi > m. 
From the proof of Lemma 5, we know 


lim z[t] = mini" —c = a — c. 


t—>-oo 


(45) 


Since (45) holds, there exists m^ > mi > m such that for all t > m'l > mi > m, the following 
holds. 


\z[t] - (a - c) I < e = - 




(46) 


Since c > 0, we have a — ^ < a. Then, for each p(-) G C, p'{a — |) < 0. Thus, 

p* = sup p'{a - ^) < 0 . 
p(')gc ^ 

Let K = “ §) — Dehne q{x) as follows, 

q{x) = 


' - ^) > 0 } 


l-^l + ^ V ■ 

' ' \ieAf 

It can be easily seen that g(-) G C is a valid function and 

p* = sup p'{a - ^) = q'{a - ^) < 0 . 
p(-)ec ^ ^ 

Note that when t > m\ > mi > m, (32) may not hold, since it is possible that z[t] — A[t]g'[t] G T. 
Let ii = max{mj,m'}. For each t>ii= ma,x{ml,m'}, (32), (45) and (46) hold. We have 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 


= Dist {z[t],Y) — A[t] 


(a) 


K‘|tl 


by (32) 


^ [t] 

H 




E '•!( 


[t] 
H 


< Dist {z[t],Y) — X[t] 


n],[t] 


Kiz 

[t] 



+ A[t](M[t] — m[t])L 




< Dist (z[t], y) — A[t]|/9*| + X[t]{M[t] — m[t])L. 


(47) 
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Inequality (a) holds because gradient is L~Lipschitz for each k £V, 

Xji [t] — Xk[t] < max = max Xi[t] — min Xj[t] = M[t] — m[t], 

i,jeAf[t] ieAf[t] jGAA[t] 


and the fact that 




Next we show that the last inequality holds. Since h^(-) is non-decreasing for each i € V, then the 
function 

, / 

E 

\i6K‘,W 




is non-decreasing. In addition, by (46) we know that a — ^ E z[t] < o — We get 




/ 

\ H 


\ 


\ 


< 


7^yt] 


E 


[t] 
H 


,a- 

iv 2' 


< sup p'{a - ) = p* < 0. 

p(-)GC 2 


Thus, 




E 


> IP*I, 


(48) 


proving the last inequality in (47). Repeatedly apply (47) for t > ti = max{m^,m'}, we get 


Kr=ti 


r=ti 


Dist {z[t -|- 1], E) < Dist (^[ti], E) — IE I \f‘\ + E A[r](M[r] — m[r])L. (49) 

Taking limit on both sides of (49), we obtain 

lim Dist {z[t -|- 1], E) < Dist E) — |E^M|i^*i + E A[r](M[r] — m[r])L 


\r=ti 

oo 


r=ti 

oo 


< Dist E) — E i'>*i + E A[r](M[r] — m[r])L 

\r=ii ) ^"=0 

Q Dist (. 2 ;[ti], E) — oo -|- Cl 
= —oo. 

Equality (a) is true due to (16), the fact that |p*| > 0 and that 

OO OO ii — 1 ii — 1 

E = E “ E “ E 

r=ii t=0 ^"=0 ^"=0 

On the other hand, we know limt_).oo Dist {z[t],Y) = c > 0. This is a contradiction. Thus, 

lim Dist {z[t], E) = c = 0. 

t^OO 


( 50 ) 
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When (A.2) holds but (A.l) does not hold: That is, there does not exist a subsequence 
such that z[tk] < minT for all k > 0] and there exists a subsequence such 

that z[t'i^] > maxT for all k > 0. Recall that z[t] ^ Y for each t > m. Then there exists m 2 > m 
such that z[t] > maxT for each m 2 - From the proof of Lemma 5, we get 

lim = maxy + c = 6 + c. (51) 

>00 

Since (51) holds, there exists m^ > m 2 > m such that for all t > > m 2 > m, the following 

holds. 


\z[t]-{b + c)\<e = ^ b + ^<z[t]<b+^. (52) 

Since c > 0, we have 6 + | > 6. Then, for each p{-) € C, p'(b + |) > 0. Then, 

P- inf p'(fe + |)>0. 
p(-)GC 2 

Let K = (6 + §) < 0}. Define q{x) as follows, 

= 1^1 ^ hj{x) + ^hj{x)l{h'j{b+'^) < 

It can be easily seen that g(-) € C is a valid function and 

infy(6+^) = g'(6 + ^)>0. 

p(')gc 2 2 



Then, p = q'{b + §) > 0. 


Note that when t > m^ > m 2 > m, (32) may not hold, since it is possible that z[t] — A[t]g'[t] E Y. 
Let t 2 = max{m 2 , m'}. For each t>t 2 = max{m 2 ,m'}, (32), (51) and (52) hold. We have 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 


< Dist {z[t],Y) — A[t] 




1 




h'i{xj[[t]) 

Y 

iGtZk [t] 


by (32) 


+ A[t](M[t] — m[t])L 


< Dist {z[t],Y) — A[t]|p| + X[t]L{M[t] — m[t]). 
Next we show that the last inequality holds. Recall that the function 

( 


(53) 



















23 


is non-decreasing. We get 




E I 2 

yi£K‘,[t| 


nl\t\ 


[t] 


by (52) 


> inf p'{h + -) = P > 0, 
p(-)6C 2 


(54) 


proving the last inequality in (53). Repeatedly apply (53) for t > ^2 = max{m 2 ,m'}, we get 


Dist {z[t -|- 1], y) < Dist [z[i 2 ], E) — (E^MIIpI + E A[r]L(M[r] — m[r]). (55) 

\r=i2 / r=i2 

Taking limit on both sides of (55), we obtain 


^lim Dist {z[t -|- 1], T) < Dist (5;[t2], R) — IE 


\r=t2 

oo 


< Dist [zlh], ^) - ( ^ 

\r=i2 

< Dist (5;[t2]) y) — oo -|- Cl 


\p\ -|- ^ A[r](M[r] — m[r])L 
r=t 2 

CXD 

IpI -|- ^ A[r](M[r] — m[r])L 

r=0 


= —OO. 

This inequality is obtained similarly to the inequality (50). On the other hand, we know lim^^oo Dist {z[t],Y) 
c > 0. This is a contradiction. Thus, 


lim Dist {z[t],Y) = c = 0. 

t—>-oo 

Both (A.l) and (A.2) hold: Let {z[tk]}^Q be a maximal subsequence of such that 

tk > and z[tk] < miny for all A; > 0. Let be a maximal subsequence of {^;[t]}“o such 

that > m' and > maxy for all k >0. Recall that z[t] ^ Y for each t>m. Then, 

{z[tk]}r=o^Mt'k]}T=o = {mmm- 

By Lemma 5, we know 

lim z[tk] = miny — c = a — c and lim z[t'i^] = maxy -|- c = 6 -|- c. (56) 

k^oo fc—>-oo 

Mtk]}T=o u {z[t'k]}T=o = 

there exist m 3 > m such that for each t > m 3 , 

3c c c ri) 3c , . 

a —— < z[t] < a — - or b + - < z[t] < b + —. (57) 

p* = sup p'(a - |) = g'(a - |) and p= inf p'{b + ^) = q'{b + ^). 

p(.)ec 2 2 p(.)ec 2 2 


Recall that 
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Recall that for each t > m' > mo, z[t] is not a resilient point with respect to g[t], and z[t] — 
A[t] 5 [t] ^ Y. Thus (32) holds, i.e.. 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 






Since (57), by (48) and (54), we get 

/ \ 

E 








> 1/7*1 or 




E '='.(4*1) 

i&'R}, [t] 

"t 


> 


Thus, 




E 


> min{|/3*|,|p|}. 


We get 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 


< Dist {z[t],Y) — A[t] 


n],y:] 


E 




I W 
H 


E 

iGlY[ [i] 


(58) 


by (32) 


+ A[t](M[t] — m[t])L 


< Tlist (z[t], y) — A[t] min{|/7 *|, IpI} + A[t]L(M[t] — m[t]) by (58) (59) 

Repeatedly apply (59) for t>is = max{m 3 ,m'}, we get 

Dist {z[t + 1],Y) < Dist [zlis], Y) — I EA[r] I min{|/7*|,|p|}+ E A[r]L(M[r] — m[r]). (60) 

\r=t3 / r=t3 

Taking limit on both sides of (59), we obtain 


^lim Dist {z[t + 1], y) < Dist (. 2 (^ 3 ], y) — E^M niin{|p*|,|p|}+ E A[r]L(M[r] — m[r]) 

\r=t3 / r=t3 

< Dist (^([ts], y) — cx) + Cl 
= — 00 . 

This inequality is obtained similarly to the inequality (50). 

On the other hand, we know limi^oo Dist {z[t],Y) = c > 0. A contradiction is proved. Thus, 

lim Dist {z[t],Y) = c = 0. 

>-oo 

The proof is complete. 


□ 
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3.2 Algorithm 2 

In Algorithm 1, in each iteration t > 1, there are two rounds of information exchange. Next we will 
present a simple algorithm which only requires one message sent by each agent per iteration. In 
this algorithm, each agent j maintains one local estimate Xj, where a:j[ 0 ] is an arbitrary input at 
agent j. 


Algorithm 2 for agent j for iteration t > 1: 


Step 1: Compute hj{xj[t — 1])- the gradient of local function hj{-) at point Xj[t — 1], and send the 
tuple (^Xj[t — l],h'j{xj[t — 1 ])^ to all the agents (including agent j itself). 

Step 2 : Let Rj[t — 1] denote the set of tuples of the form {xi[t — l],h^{xi[t — 1])) received as a 
result of step 1. Update Xj as 


Xj[t] = 




{xi[t - 1] - x[t - i]h'i{xi[t - 1])) y 

[i—1] 


(61) 


Note that TZj[t — l] C —1]. In addition, set Y is the same as that defined earlier for Algorithm 


1 . 


Lemma 7. Under Algorithm 2, the sequenee {M[t] — rn[t]}^Q converges and 

lim {M[t] — m[t]) = 0. 

>00 

Recall that M[t] = maxjg_/v-[i] x,[t] and m[t] = minj£_yv[i] Xj[t]. Lemma 7 implies that asymptotic 
consensus is achieved under Algorithm 2. The proof of Lemma 7 is similar to the proof of Lemma 
3, and is omitted. 

Lemma 8. Under Algorithm 2, the following holds. 

00 

{M[f\ — m[t]) < 00 . 
t=o 

The proof of Lemma 8 is the similar to the proof of Lemma 4, and is omitted. By Lemma 8 , we 
know there exists some constant C 2 such that for any constant t >0, 


Y^ {M[t] — m[r]) < Y^ (-^M “ rnlr]) < 6 * 2 . 

T=t r=0 

The following corollary is an immediate consequence of Lemma 8 . 

Corollary 2. Under Algorithm 2, 

lim X[t] {M[t] — m[t]) = 0, 

t^oo 


( 62 ) 


lim A[r] (M[r] — m[r]) = 0. 

t^OO ^ 
r=t 


and 
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The proof of Corollary 2 is similar to the proof of Corollary 1, and is omitted. 

In our convergence analysis, we will use the well-know “almost supermartingale” convergence 
theorem in [18], which can also be found as Lemma 11, in Chapter 2.2 [17]. We present a simpler 
deterministic version of the theorem in the next lemma. 

Lemma 9. [18] Let {at}^Q, {6i}^Q, and {ct}^Q be non-negative sequences. Suppose that 

Ot+i < CLt — bt + Ct for all t > 0, 

and ^ sequence converges to a non-negative value. 

Recall that set Y is the same as that defined earlier for Algorithm 1. We define z[t] and Xj^ 
similar to that for Algorithm 1. In particular, let {2:[t]}^o ^ sequence of estimates such that 

z[t] = Xj^[t], where jt € argmaxj^j^j-^^jllist (xj[t],Y). (63) 

From the definition, there is a sequence of agents {j7}“o associated with the sequence {z[t]}^Q. 

Theorem 2. The sequence {Dist{z\t].,Y)'\^Q converges and 

lim Dist {z[t],Y) = 0. 


Proof. We first try to derive an iteration relation similar to that in (34). 

Dist {z[t -|- 1], y) = Dist [t -|- 1], y) by (63) 

1 


= Dist 


l^jt+i WI 


{xi[t] - X[t]hi{xi[t])) , y by (61) 




= Dist 


< 






l^jt+i [^] I 


'Yi Dist I Xi[t] — A[t]- 




'^jt+i WI 


Y^ hk{xk[t]), T) by convexity of IDist (•, y) 




< max Dist xjt] — Ah]-;- —t 

\ 1%+lWl 




(64) 




Let 


j't+i e argmaXig7^^,^_^ji|Dist | Xi[t] - A[t] 


|"^ii+l WI 




Y 


(65) 


ken 


'3t+l 


[i] 


Note that G %+i[i] y A7[t], i.e., G J\f[t]. 
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We get 


Dist {z[t + 1],Y) < max Dist ( Xi[t] - A[t] . ^ ^ F ] by (64) 


= Dist -p^—— ^'ki^k[t]), Y | by (65) 


= inf 
yeY 


1 

\'^jt+l w I 
1 


l^it +1 [^] I 


h'k{xk[t])-y 


[t] 


( 66 ) 


For each y € y, we have 


^ft+i W “ ^ ^ Kixk[t]) - y 




- A[t]^-^ ^ K{xj,Jt]) -y + A[t]^-^ ^ [h'kixj^Jt]) - h'^ 


< 


< 


(a) 

< 


(b) 

< 


^w i7?/ [,ii E M) “2/ 

k€nj,^,[t] 




^il+i W - 


X 


'y * 

Jt +1 



[*lt 

1 


\^3t+l 

1*11, 

1 


\^3t+l 

1*11, 

^ 1 


\^3t+l M1 

, 1 



v\ 




+ 


AM- 


'^jt+i MI 


E {hkixj[^^[t]) - K 




l^it+i MI 


[t] 




+ ^[‘lT^-r;iT E hk{Xk[t]) 


+ It? ^ Ml E ^ W 


+ AML(MM — m[t]). 


(67) 


Inequality (a) holds because gradient h'j^{-) is L-Lipschitz for each k £ V. Inequality (6) holds 
from the fact that 


Xj'^^^it] - Xk[t] 


< max (xAt] 
i,jeY[t] 


Xj[t]) 


max Xi[t] — min Xj[t] = M[t] — m[t], 
ie^f[t] j&Y[t] 


1 


\'^jt+i MI 


E ! = '■ 


and that 
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Using (67), the inequality (66) can be further bounded as 

1 


Dist {z[t + 1],Y) < inf 

y&Y 


< inf 
ydY 




'^jt+i w I 


h'kixk[t])-y 






'^jt+i MI 






+ X[t]L{M[t] — rn[t]) by (67). 

( 68 ) 


Note that for each t > 0, there exists a non-faulty agent such that (67) holds, and there 
exists a sequence of agents Let {x[t]}“Q be a sequence of estimates such that 


Let ^ sequence of gradients such that 

1 


9[t] = 


\'^jt+i MI 






If x[t] = [t] is a resilient point with respect to the gradient g[t] = 

by Definition 1, we bound (68) further as 


(69) 


(70) 




Dist {z[t + 1],Y) < inf 

y&Y 


w - 1^/ r.U E W)) - y 


+ LX[t] {M[t] — m[t]) 


< LX[t] + LX[t] {M[t] — m[t]). (71) 

If x[t] = [t] is not a resilient point with respect to the gradient g[t] = ^ YlkeHj [t] ^'ki^ji+i W)’ 

then from Definition 1, we know that 

Cl: if Xf^Jt] € y, then Xf^Jt] - 

C2: if < minU, then < “axU, 


C3: if > maxU, then [t] — 




Ki^rY^]) > minU. 


We consider two scenarios: scenario 1 

A[t] 


" \n- \t]\ W) e i", 


and scenario 2 


The first scenario can possibly appear in each of (71,(72, and (73. In contrast, the second scenario 
can only appear in (72 and (73. 
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Scenario 1: Assume that 




X[t] 


Y1 e y, 


it holds that 


inf 

y&Y 




A[t] 


It?. r+i| h+i 


Y 1 ^kixj'[t])-y 


It?. r+i| Z^ "'kK'^j't^i 


= 0 < Dist {z[t],Y) 


Thus, (68) can be bounded as 


Dist {z[t + 1], y) < inf 
y& 






K{xj[^^[t]))-y 


[t] 


< 0 + LX[t] {M[t] — m[t]) 

< Dist (^[t], y) + LX[t] {M[t] — m[t]). 

Scenario 2: Assume that 

A[t] 


+ LX[t] {M[t] — m[t]) 

(72) 

(73) 




Y1 Kixj'[t]) i y = [miny,maxy]. 


It?. [+i| z^ 


As commented earlier, either C2 holds or (73 holds. In addition, from the assumption of scenario 
2, (72 and (73 can be further refined as follows. 

C2': [t] < min y and [t] - [t] K W) < min y 

(73': > maxy and > “^xy 

Similar to (30), it can be shown that for both (72' and (73', the following holds. 

AM 


M - IT^. r^U E W) - 


- am 


T^. M| E ^fc(^.f+iM) 

(74) 


Thus, under scenario 2, we can bound (68) as 

1 


Dist {z[t + 1],Y) < inf 
y& 


^ii+iW - |Te.^^j^]| E M)) - y 


fcs7?.jj^i [t] 


+ LAM (A^M ~ rn[t]) 


= inf 

y& 


= Dist 


'ji+^^]-y - AM 
^i;+iM,A") - AM 


E ^fc(^iyiM) 


It?. itiI Z^ '“ky-^j't+ii 




It?, [ill z^ ^t+A 


+ LAM (ALM — mM) by (74) 

+ LAM (ALM — mM) 


< Dist {z[t],Y) — AM 




It?, [ill 


+ LAM (ALM — m-M) 


(75) 


< Dist {z[t],Y) + LAM (ALM “ m-M) • 


(76) 
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By (71), (73) and (76), for each f > 0, we obtain the following iteration relation 

( 2 :[t + 1], y) < max {A[t]L, Dist {z[t],Y)} + X[t]L {M[t] — m[t]) . (77) 


Recall (69), (70) that x[t] = and g[t] = Similar to the 

proof of Theorem 1, we consider two cases : case (i) there are infinitely many points in {a;[t]}“o 
that are resilient with respect to (^i) there are finitely many points in {x[t]}“Q 

that are resilient with respect to respectively. 

Case (i): There are infinitely many points in {a;[t]}“Q that are resilient with respect to {(7[t]}“o- 
Let {ti}“o he the maximal sequence of such indices. Since x[ti] is a resilient point with respect 
to g[t] for each i, then for each ti, by (71), we have 

Dist {z[ti + 1], y) < X[ti]L + X[ti]L {M[ti] - m[ti ]), (78) 

and for each t ^ tiMi, by (73) and (76), we get 

Dist {z[t + 1], y) < Dist (z[t], y) + X[t]L {M[t] — m[t ]), (79) 

Taking limit sup on both sides of (78), we get 

limsup Dist {z[ti + 1],Y) < limsup A[tj]L + limsup A[L]L (M[ti] — m[tj]) 

i^oo i^oo 2—>-CxD 

= 0 + 0 = 0 by Corollary 2. (80) 

In addition, liminfi_,.oo Dist {z[ti + 1], y) >0. Thus, the limit of Dist {z[ti + 1], y) exists, and 

lim Dist {z[ti + 1],Y) = 0. 

i^oo 

For each t > to and r ^ {L}“q, there exists such that < t < Repeatedly applying 

(79), we get 

T 

Dist {z[t + 1],Y) < Dist [z[ti(^T-'^ + 1],Y^ + ^ X[r]L {M[r] — m[r]) 

»’=h(T)+l 

T 

E . A[r] (M[r] — m[r]) L by (78) 

T 

= A[ti(^)]L+ A[r] (M[r]-m[r])L 

'r=ti(T) 

oo 

E A[r] (M[r] — m[r]) L since A[r] (M[r] — m[r]) L > 0, Vr 

r=ti(T) 

(81) 

Taking limit sup on both sides of (81), we get 

OO 

lim sup Hist ( 2 :[r + l],y) <= lim A[tj(-,-)]L + lim X[r] {M[r] — m[r]) L 

T^OO ^ 

^ ^*(t) 

= 0 + 0 = 0 by Corollary 2 
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To apply Corollary 2 here we have to have oo when r ^ oo. This is true since there are 

infinite resilient points. 

Using a similar argument used earlier in the proof of Theorem 1, we conclude that limt_j.oo Dist {z[t], Y) 
exists and limi_j.oo Dist {z[t], Y) = 0. 

Case (ii): There are finitely many points in {a:[t]}^g that are resilient with respect to 

By the assumption in case (ii) we know that there exists a time index mg such that for all 
t > W'O) each x[t] is not a resilient point with respect to g[t]. Thus, for t > mg, either (73) or (76) 
holds. Thus, for t > mg, we have 

Dist {z[t + 1],Y) < Dist {z[t],Y) + X[t]L {M[t] — m[t]). (82) 

Define {a^l^g, { 6 r}^ 0 ) {cr}^o follows. 

Or = Dist ( 2 ;[mo + r],Y), 

br = 0 , 

Cr = A[mg + r]L (M[mg + r] — m[mg + r]). 

By Lemma 8 and Lemma 9, we know the limit of Dist {z[t],Y) exists. Let c > 0 be a nonnegative 
constant such that 


lim Dist {z[t],Y) = c. 

t^OO 

Repeatedly applying (82), we get 

Dist {z[t + 1], y) < Dist (z[f], y) + X[t]L {M[t] — m[t]) 

t 

< Dist{z[mQ\,Y) + ^ A[r]L (M[r] — m[r]) 


(83) 


r=mo 

oo 


< 


Dist (z[mg], y) + ^ A[r]L (M[r] — m[r]) 


r=mo 

oo 


< Dist ( 2 ;[mg], y) + ^ A[r]L (M[r] — m[r]) 

r=0 

< Dist {z[mo],Y) + C 2 by (62). 

Thus, by (84), we know that for each t > mg 

Dist {z[t + 1], y) < Dist (z[mg], y) + C 2 - 

Thus, 

lim Dist {z[t],Y) = c < 00 . 

t—^OO 

Case (ii.a): Assume that there are infinitely many time indices t > mg such that 
x[t] - X[t]g[t] = Xj>^^[t]-X[t] . ^ KixjY'^]) e Y. 

Let {tfcl^g be the maximal sequence of such indices. By (72), we have 

Dist {z[tk + 1], y) < 0 + LX[tk] {M[tk] - m[ 4 ]). 


(84) 


(85) 
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Taking limit on both sides of (85), we get 

lim Dist {z[tk + 1], T) < 0 + L lim A[tfc] {M[tk] — m[tk]) 

k^oo k^oo 

= 0 + 0 = 0 by Corollary 2 

On the other hand, limfc_).oo Dist {z[tk + 1], T) = c > 0. Thus, 

c= lim Dist {z[t],Y) = lim Dist {z[tk + 1],Y) = 0, 

t^oo /c—>-oo 


proving the theorem. 


Case (ii.b): Assume that there are only finitely many time indices t > rriQ such that 

1 


x[t] - X[t]g[t] = Xj,^^[t] - X[t]- 


'^jt+i MI 




[t] 


Then, there exists m! > tuq such that for each t > m' > mo, x[t] is not a resilient point with respect 
to g[t], and 

x[t] - X[t]g[t] = ^ Y. 


[t] 


Thus, for each t>m'> mo, (75) holds, i.e.. 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 



[i] 


+ LA[t] (A7[t] 


m[t]). 


Recall that 0 < c < oo is a nonnegative constant such that lim^^oo Dist{z\t],Y) = c. Next 
we show that c = 0. We prove this by contradiction. Suppose c > 0. By Lemma 5, we know that 
either (A.l) is true or (A.2) is true. 

(A.l) There exists a subsequence such that z[tk] < minT for all k >0. 

(A.2) There exists a subsequence {z[t'f,]}’^Q such that z[t'f^] > maxT for all A: > 0. 

In addition, at least one of (minT — c) or (maxT + c) is an accumulation point of {-2[t]}^o> 
no other accumulation points exist. 

Let a = minT, b = maxT and e = It can be seen from the proof of Lemma 5 that there 
exists m such that z[t] ^ Y for each t > m. We consider three scenarios: (A.l) is true but (A.2) is 
not true, (A.2) is true but (A.l) is not true, both (A.l) and (A.2) are true. 


When (A.l) holds but (A.2) does not hold: That is, there exists a subsequence {z[tk]}’^o 
such that z[tk] < minT for all k > 0] and there does not exist a subsequence such that 

z[t'i,] > maxT for all k >0. Then there exists mi > m such that z[t] < minT for each t > mi > m. 
From the proof of Lemma 5, we know 

lim 2 ;[t] = minT —c = a — c. 

t—>-oo 

Since (45) holds, there exists m^ > mi > m such that for all t > > mi > m, the following 

holds. 

\z[t] — (a — c) I < e = ^ a —^ < z[t\ < a — 


( 86 ) 
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Since c > 0, we have a — ^ < a. Then, for each p{-) G C, p'{a — |) < 0. Then, 

p* = sup p'{a - ^) < 0 . 
p(')6C ^ 

Let K = l{/i'(o — §) > 0}. Define q{x) as follows, 

q{x) = 


' ^ hj{x) + ^ hj{x)l{h'j{a - ^) > 0 } I . 


It can be easily seen that q(-) G C is a valid function and 

p* = sup p'{a - ^) = q'{a - ^) < 0 . 

Note that when t > m\ > mi > m, (75) may not hold, since it is possible that z[t] — A[t]g'[t] G Y. 
Let ti = max{mj,m'}. For each t >ti = max{mj,m'}, (75), (45) and (46) hold. We have 


Dist {z[t + 1], y) < Dist {z[t],Y) — A[t] 


< Dist {z[t],Y) — A[t] 




E 

< Dist {z[t],Y) — A[t]|p*| + 2A[t]L {M[t] — m[t]). 
Repeatedly applying (87) for t>ti= max{m*,m'}, we get 


+ A[t]L {M[t] — m[t]) by (75) 

+ 2A[t]L {M[t] — m[t]) 

(87) 


Dist {z[t + 1],Y) < Dist {z[ii], Y) — |p-| + 2 j; A[r]L (M[r] — m[r]). (88) 

\r=ii j r=ti 

Taking limit on both sides of ( 88 ), we obtain 

/ oo \ oo 

^lim Dist {z[t + 1], T) < Dist (^[ti], T) — IE A[r] I |p*| + 2 E A[r]L (M[r] — m[r]) 


\r=ti 

oo 


r=ti 


< 


Dist {z[ii], E) - ^ A[r] \p*\ + 2 C 2 by (62) 


\r=ti 


= Dist {z\ii], y) — 00 + 2 C 2 

= — 00 . (89) 

On the other hand, we know limi_).oo Dist {z[t], Y) = c > 0. This is a contradiction. Thus, 

lim Dist {z[t],Y) = c = 0. 

t—^CiO 


Similarly, we can show the case when (A. 2 ) holds but (A.l) does not hold, and the case when 
both (A.l) and (A.2) hold. 

The proof of the theorem is complete. 

□ 
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4 Synchronous Byzantine Iterative Algorithm 

In this section, we present an iterative algorithm, in which each non-faulty agent sends only one 
message per iteration, and keeps minimal memory across iterations. We assume each local cost 
function hj{-) has L-Lipschitz continuous derivative. 


Algorithm 3 for agent j for iteration t > 1: 


Step 1: Compute h'- {xj[t — 1]) - the gradient of the local cost function hj{-) at point Xj[t — 1], 
and send the estimate and gradient pair {xj[t — l],h' {xj[t — 1 ])) to all the agents (including 
itself). 


Step 2 : Let 'R-j[t — 1] denote the set of tuples of the form {xi[t — 1], h'^{xi[t — 1])) received as a 
result of step 1 . 

In step 2, agent j should be able to receive a tuple {wi[t — l],gi[t — 1]) from each agent i € V. 
For non-faulty agent i G M, Wi[t — I] = Xi[t — I] and gi[t — 1] = h[ {xi[t — 1]). If a faulty agent 
k ^ T does not send a tuple to agent j, then agent j assumes — l],gk[t — 1]) to be some 
default tuple. ^ 


Step 3 : Sort the first entries of the received tuples in TZj [t — I] in a non-increasing order (breaking 
ties arbitrarily), and erase the smallest / values and the largest / values. Let — 1] be the 
identifiers of the n — 2/ agents from whom the remaining first entries were received. Similarly, 
sort the second entries of the received tuples in TZj[t — 1] in a non-increasing order (breaking 
ties arbitrarily), and erase the smallest / values and the largest / values. Let 'R-jit — 1] be 
the identifiers of the n — 2 / agents from whom the remaining second entries were received. 
Denote the largest and smallest gradients among the remaining values by gj [t — 1] and gj [t — 1], 
respectively. Set gj[t — 1] = ^ {gj[t — 1] + gj[t — 1]). 

Update its state as follows. 


Xj[t] 


1 


re - 2/ 


/ \ 

^ie7ei[4_i] y 


X[t - l]gj[t - 1]. 


(90) 


Let C be the collection of functions defined as follows; 


C = { p{x) : p{x) = aihi{x), Mi G M, ctj > 0 
ieAf 

Ui = 1 , and 

i£Af 

^iU> ' 




m\-f) 


> } 


(91) 


® In contrast to Algorithms 1, 2 and 3 in [19], the adopted default tuple in Algorithm 3 here is not necessarily known 
to all agents. In addition, the default tuple may vary across iterations. 
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Each p{x) G C is called a valid function. Note that the function hi{x) € C since n > 

3/ + 1 and lA/"! > 2/ + 1. For ease of future reference, we let p(x) = XliGAT Define 

^ - Dp( 2 ,)g^rgmin p{x). 

Lemma 10. [19] Y is a convex set. 

Lemma 11. Y is a closed set. 

Lemma 11 is proved in Appendix G. By Lemma 11, Definition 1 is well-defined over Y. 

4.1 Update Dynamic — Matrix Representation 

Definition 2. [23] For a given graph G{V,E), a reduced graph % is a subgraph ofG{V,£) obtained 
by (i) removing all the faulty agents from V along with their edges; (ii) removing any additional up 
to f incoming edges at each non-faulty agent. 

Let us denote the collection of all the reduced graphs for a given G(V,£’) by Rj^. Thus, V — J-” is 
the set of agents in each element in Rjr. Let r = It is easy to see that r depends on F, and 

it is finite. 

Without loss of generality, assume agents indexed from 1 through n—c] are non-faulty, and agents 
indexed from n — 0 -|- 1 to n are faulty. Let x[t — 1] G be a real vector of the local estimates at 

the beginning of iteration t with Xj[f — 1] = Xj\t — 1] being the local estimate of agent j G M, and 
let g[t — 1] G R"'“'^ be a vector of the local gradients at iteration t with gj[t — 1] = gj[t — 1], J G Af. 
Since the underlying communication network is a complete graph with n > 3f + 1, as shown in 
[22], the update of x G R”'“‘^ in each iteration can be written compactly in a matrix form. 

x[t] = M[f — l]x[f — 1] — X[t — l]g[t — 1]. (92) 

The construction of M[t] and relevant properties are given in [22]. Let R G IZjr be a reduced 
graph of the given communication graph, with H as the adjacency matrix. It is shown in [22] that 
in every iteration t, and for every M[t], there exists a reduced graph FLlt] G 71 jr with adjacency 
matrix H[f] such that 

M[t] > /3U[t], (93) 

where 0 < /3 < 1 is a constant. The definition of /? can be found in [22]. 

Equation (92) can be further expanded out as 

x[t] = M[t — l]x[f — 1] — X[t — l]g[t — 1] 

= M[t — 1] (M[t — 2]x[t — 2] — A[f — 2]g[t — 2]) — X[t — l]g[f — 1] 

= M.[t — l]M[t — 2]x[f — 2] — A[f — 2]M[f — l]g[f — 2] — X[t — l]g[t — 1] 

= {M[t - l]M[f - 2] • • • M[0]x[0]) - A[0] {M[t - l]M[f - 2] • • • M[l]g[0])- 

- X[t - l]g[t - 1] 

t-i 

= 4>(t — 1,0)x[0] — ^ A[r]$(f — 1, r -|- l)g[r], (94) 

r=0 

where ^(t — l,r) = M[f — l]M[f — 2] • • • M[r] is a backward product, and by convention, — 
l,t-l) = M[f - 1] and ^{t - 1, t) = I. 
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4.2 Correctness of Algorithm 3 

Using coefficients of ergodicity theorem, it is showed in [22] that r) is weak-ergodic [22], and 
that the rate of the convergence is exponential [2], as formally stated in Theorem 3. Recall that 
T = \Rj^\, n — 4> is the total number of non-faulty agents, and 0 < /3 < 1 is a constant for which 
(93) holds. 

Theorem 3. [2] Let v = T{n — cp) and 7 = 1 — /3^. For any sequence 4'(t, r), 

(95) 

for all t > r. 

Lemma 12. For all i,j £ Af and for each t > 1, 

t-i 

|xj[t] — Xj[t]\ < {n — (f) max{|u|, -|- L E \[r]{n — ^ 

r=^ 

and for all i,j £ Af and for t = 0, 


|xi[0] — Xj[0]| < U — u. 

The proof of Lemma 12 can be found in Appendix D. 

Corollary 3. For i,j £ Af, 

lim \xi[t\ — Xj[t]\ = 0. 

t—^OO 

We present the proof of Corollary 3 in Appendix E. 

Let M[f\ = maxjg_y [t] and m[f\ = minjg_/\^x,[t]• The following lemma holds. 

Lemma 13. Under Algorithm 3, the following holds. 

OO 

X[t] {M[t] — m[t]) < OO. 
t=o 

The proof of Lemma 13 is similar to the proof of Lemma 8 . For completeness, we present the proof 
in Appendix F. 

Proposition 2. Let a,b,c,d £ M. such that b < a,b < c < ^ {a + b), ^ {a + b) < a < d, and there 
exists 0 < ^ < 1, for which ^ {a + b) = fd + {1 — f)c holds. Then 

- < C < 1- 


Proof. Suppose, on the contrary, that 0 < ^ < ^. Since c>b and d> a, we have 

^{c + d)>^{a + b) 

On the other hand, by the assumptions that a > b, and that ^ (a + 6) > c, it holds that 

d > a > - {a + b) > c, 


(96) 


(97) 
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i.e., d > c. Then 

:^{c + d) = -c+-d 

= {c+(i-«)c+i<i 

by (97) 

= {c+ 

= ^c+(l-Orf 

= - (a + 6), (98) 

i.e., ^ {c + d) < 5 (a + b). The relations in (96) and (98) contradict each other. Thus, the assumption 
that 0 < ^ < i does not hold, i.e., ^ ^ < 1, proving the proposition. 

□ 


Lemma 14. For each non-faulty agent j € M and each iteration t > 1, there exists a valid function 
P(^) = ^i{x) € C such that 

h'i{xi[t-1]). 

ieJV 

Proof. Recall that — 1] denotes the set of agents from whom the remaining n — 2/ gradient 
values (second entries of the tuples) were received in iteration t, and let us denote by Cj[t — 1] and 
Sj[t — 1] the set of agents from whom the largest / gradient values and the smallest / gradient 
values were received in iteration t. 

Let i* , j* G Itj [t — 1] such that gi* [t — 1] = gj [t — 1] and gj* [t — 1] = gj [t — 1] • Recall that IT”! = (j). 
Let C* [t — 1] C Cj [t — 1] — T” and Sj [t — 1] C Sj [t — 1] — F such that 

-i]| = /-())+ |7^2[^-l]nJ•|, 

and 

|>s;[t-i]| = /-()) +|7^|[^-l]nJ•|. 

We consider two cases: (i) gj[t — 1] > gj[t — 1] and (ii) gj[t — 1] = gj[t — 1], separately. 


Case (i).- gj[t — 1] > gj[t — 1]. By definition of C*[t — 1] and S*[t — 1], we have 


/-()) + |7^2[^-l]nJ•| , 


^ gi[t -'^] < gj[t - 1] < 


ies*[t-i] 

Thus, there exists 0 < ^ < 1 such that 

1 


f-(^ + \mt-i]nF\ , 




i£C*[t-l] 


(99) 


gj[t -1] = C 




57[i-l] +(1-0 




/-0+|7^2[^-l]n7•l 


E 

iec* [t-i 






E gi[t-M + 


i-C 




E 5i[i-i]- 


gi[t - 1] 


( 100 ) 
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By symmetry, WLOG, assume ^ > i. 


Let k G —1]—By symmetry, WLOG, assume gk\p — l] < — Since —l]U{j*}| = 

/ + 1, there exists a non-faulty agent G Cj[t — 1] U {j*}. Thus, Qj/Jf — 1] > gj[t — 1] > ^j[t — 1], 
and there exists 0 < < 1 such that 

\ {9j[t - 1] +5i[i - 1]) = QjV - 1] = ikgk[t - 1] + (1 - 6)5'j'[i - !]■ (101) 

Let a = gj\t — l],h = gj\t — l],c = gk[t — 1], and d = gji [t — 1]. By Proposition 2, we know that 


Since lA/"! — / = n — (/> — / = n — 2/ + / — ^ = — 1] + f — (j) = — 1] — A 

TZ'^At — 1] n T + f - 4>, we get 


+ 




|K?|f - 1] - _ J] ^ -1| n 


1 


m-f 

( 


W\-f 




W\-f 

1 

'W\^f 
+ 


\ 


9j[t-A 

\ke'R?.[t-l]-T 

E 


+- rTTT -^- 1] 




W\-f 

-l] + (l-4)5i'[t-l]) 


W\-f 


E 9i[t-^ + 




1-^ 

W\-f 


9i[t- 1] by (100) and (101) 




m-f 


+ 


E ^T-'k{xk[t - 1]) + (1 - ?fc) - 1])) 




m-f 


E K{xi[t-i]) + 




1-e 

W\-f. 






Define q{x) as follows. 

q{x) = 


m-f 


(^khkix) + {1 - Ck)hf^{x)^ 


+ 


fce7e2[t-i]-j- 
^ V- . / ^ , 1-^ 


m-f 


E ^*(®) + 


m-f 


E 


( 102 ) 


In (102), for each k G Tl‘j[t — 1] — T", it holds that each i G Sj[t — 1], it holds 

that > 2 (|W-/) • addition, we have 

I (7^2[^ - 1] - ^) U 5;[t - 1]| = |7^2[^ - 1] - .FI + \S*[t - 1]| 

= |F|[t-i]|-|F2[t-i]nF| + |5;[t-i]| 

= n - 2/ - |F|[t -1] n F| + / - </. + \n][t -1] n F| 

= n- (j) - f = \Af\ - f. 
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Thus, in (102), at least |AA| — / non-faulty agents corresponding to agents k G “ 1] “ U 

5^(1 — 1] are assigned with weights lower bounded by 2 ( \ jJ\-f) ■ 


Case (ii); gj[t — 1] = gj[t — 1]. Let k G — 1] — F. Since gj\t — 1] > gk[t — ^ > gj[t — 1] and 
gj[t — 1] = gj{t — 1], it holds that gj[t — 1] = gk{t — 1] = gj[t — 1]. Consequently, we have 


9j[t “ 1] = 2 “ 1] + 9j[t - 1]) = 9k[t - !]• 


So we can rewrite gj\t — 1] as follows. 

1] = nTTi- -f9j[t-^ 


W\-f 


1 


/ 


w\-f 

1 

'W\^f 

1 

£ 

+ 


2^ -1] +-nin— 1 - 


-\\-T 


W\-f 


y~! 9k[t-i]+ 

f^'kixk[t -1]) 

A:e7^2[^-l]-JP 

Y 1 Kixi[t-i]) + 






m-f 

y~! “ 1] + 


ie5*[4-i] 


1-C 

W\-f 


9i[t-M 






1-e 

W\-f 




iaC* [i-1] 


Define q{x) as follows. 

= I Ari^_ f Y1 f Y1 + \ t\-f Y1 

I I k^np-i]-T ' ' iG5;[t-i] ' ' i£C*[t-i] 

In (103), for each k G F?j[t — 1] — T", it holds that pv^rj ^ 2(|a/|-/) ' ^ '^ji^ ~ 1]; it holds 

that pypry > 2 {\F\-f) ~ addition, we have 

l(7e2[t_i]_^)u5;[t-i]| = |Ar|-/. 

Thus, in (103), at least |AA| — / non-faulty agents corresponding to (F‘j[t ~ 1] ~ U S*[t — 1] are 
assigned with weights lower bounded by 2 {\j\}\-f) ■ 


Case (i) and Case (ii) together prove the lemma. 

□ 

Proposition 3. For each non-faulty agent j ^ M and each t > 1, there exists a set of convex 
coefficients f3i’s over non-faulty agents, i.e., Pi > 0 for each i £ Af and YlieM A = such that the 
following holds 

u;4t-l] = J^/3iXi[t-l]. 

Proof. Note that — 1] = “ 1] “ C (Fj[t — 1] C fJ . We consider two cases: (i) Fj[t — 

1] n J” = 0 and (ii) — 1] n J” / 0, separately. 






















40 


Case (i): 7^j[t — 1] n = 0. When 7^j[i — 1] n = 0, every agent in — 1] is non-faulty, i.e., 
[i — 1] C J\f. Then we get 

-- Wi[t — 1] =- - xM — 1] since wM — 1] = Xi[t — 1] for each i € Af 

n — 2f ^ n — 2f ^ 

(104) 


Let j3i = each i £ TZ^[t — 1] C A^, and /?* = 0 for each i £ Af — — !]■ The obtained /3j’s 

is a valid collection of convex coefficients, since Pi = > 0 for each iAf, and 


Ea= E e 

i&M i&n][t-i] iGTeht-i] 


1 


1 


n - 2/ n - 2/ 


\n][t-i]\ = 


1 


n-2/ 


(n - 2/) = 1. 


Case (ii): — 1] H T” 7^ 0. Let — 1] be the set of the identifiers of the / agents from whom 

the / largest first entries — l]’s) are received, and let Sj[t — 1] be the set of the identifiers of the 
/ agents from whom the / smallest first entries {wi[t — l]’s) are received. Since Tlj[t — 1] nT” 7^ 0, 
it holds that Cj[t — 1] Ci Af 0 and Sj[t — 1] (1 Af 0. Let I and s be two non-faulty agents such 
that I £ Cj[t — 1] nAf and s £ Sj[t — 1] (lAf. By definition of — 1], for each k £ TZj[t — 1] Pi T, 
we have 


Xs[t-1]= <Wk[t-l] < wi[t - 1] = xi[t - 1], 


(105) 


Then, 


|7^][^ - 1] n J'l Xs[t - 1] < Wk[t-l] < \7lj[t - l]PT\xi[t - 1]. 

k£TZpt—l]r\T 


Thus, there exists 0 < C E 1 such that 

u;A,[t-l] =C(|7^][^-l]nJ•|x,[^-l]) + (l-C)(|7^][^-l]n7•|x^[^-l]) . (106) 

feG7ebt-i]nJ' 


Thus, 

1 

n-2/ 

1 


Y1 


1 


( 


n-2/ 


/ 

y^ Wi[t-i]+ y^ Wi[t-i] 


n-2/ 

1 

n-2f 

1 

n-2/ 


yy Xi\t — 1] + yy Wi[t — l] smce Wi[t — l] = x^t — l] for each i £ Af 

( \ 

y^ Xi[t - 1] + c - 1] n .T| Xs[t - 1]) + (1 - C) - 1] n 7-| Xi[t - 1]) 


Y1 Xi[t-i] + 


C|7^][^-l]n7•| (l-C)|7^][^-l]nJ•| 




n-2/ 


-Xs[t - 1] + 


n-2/ 


xi[t - 1]. 


by (106) 


( 107 ) 
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T 4- « C\n][t-i]nT\ (1-C)|7ei[t-I]nj-| ^ , 

Let /4s = —-, Pi = -;^f27-> i G - IJ - >, and let 

/4i = 0 for all other non-faulty agents. The obtained PPs is a valid collection of convex coefficients 

since 


^ ^ n — 2j n — )t ^ ^ 


1 




n-2f 




n-2f 


n][t-i]nT\ \n][t-i]-T\ 


+ 


n-2f n-2f 

“ l]l _n - 2/ 


= 1 . 


n — 2/ n — 2f 
Case (i) and case (ii) together prove the proposition. 


□ 


We define z[t] and Xj^ similar to that for Algorithm 1. In particular, let {z[t]}^g be a sequence 
of estimates such that 

z[t] = XjPt], where jt G {xj[t],Y). (108) 

From the definition, there is a sequence of agents {jt}“o associated with the sequence {z[t]}^Q. 
Theorem 4. The sequence {Dist{z[t],Y)'\^Q converges and 


Proof. 


lim Dist {z[f\,Y) = 0. 

t—^CiO 


Dist {z[t + 1],Y) = Dist {xj^^pf\,Y') by (108) 

1 




= Dist 


n — 2/ ^ 


- ^[i]9jt+i[t], Y 


by (90) 


= Dist PiXi[t] — [t], Y by Proposition 3 

\i&Ar / 

= Dist I ^ Pi {xi[f\ - X[f\gj,^^ [t]) , y\ since Pi = I 
\ieM / ieM 

< Pi Dist [xi[t] — [t], Y) by convexity of Dist (•, Y) 

ieM 

< maxDist (xj[t] - X[t]gj^^pt], Y) . 
i£N 

By Lemma 14, there exists a valid function pp-) = Ylq^Ar ^ ^ such that 

9jt+i[i] = Y1 «g^g(®-?W)- 
q^N 


(109) 


( 110 ) 


In addition, let 


Ji+i G argmaXjg_yv-L>*st {xi[t] - X[f\gp+Pf\, Y) . 
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We get 


Dist{z[t + 1],^) < uiaxDist (xj[t] — ^[t]gjt+i[t], Y) by (109) 
iGAf 


= Dist y) 


Dist I — A[t] aqh'g{xq[t]), Y 1 by Lemma 14 

q&M 


= inf 

y& 


< inf 

y& 


“9^9 (^9 W) “ y 

q€M 

“ A[iK+i(xj/^Jt]) - y + X[t]L{M[t] - m[t]). (Ill) 


where pt is defined in (110). Note that for each t > 0, there exists a non-faulty agent j[ such that 
(111) holds, and there exists a sequence of agents ^ sequence of estimates 

such that x[t] = Let {5[t]}“o ^ sequence of gradients such that ^[t] = 


The remaining of the proof is identical to the proof of Theorem 2. 

□ 


5 Discussion and Conclusion 

So far, a synchronous system is considered. In an asynchronous system, when there are up to / crash 
faults. Problem 1 is not solvable, since it is possible that every agent in the system is non-faulty, 
but / agents are slow. In this case, the system will mistakenly “treat” the slow agents as crashed 
agents. Consequently, the weights of the slow agents may be strictly smaller than the other agents. 
Despite the impossibility of solving Problem 1 in asynchronous system, nevertheless, Problem 2 
can be solved with /3 > A and 7 > jA/"! — /. In particular, Algorithm 2 can be easily adapted for 
asynchronous system by modifying the receiving step (step 2). For completeness, we list out the 
algorithm for crash faults. 


Algorithm 4 (crash faults) for agent j for iteration t > 1 : 

Step 1: Compute hj{xj[t — 1])- the gradient of local function hj{-) at point Xj[t — 1], and send the 
triple (^Xj[t — 1], h'j{xj[t — 1]), to all the agents (including agent j itself). 

Step 2: Upon receiving {xi[t — 1], h[{xi[t — 1]), t) from n — / non-faulty agents (including agent j 
itself) - these received tuples form a multiset TZj[t — 1], update Xj as 


Xj[t] 


\TZq[t-l]\ 


^ {xi[t - 1] - X[t - l]h'i{xi[t - 1])) 

yi&'R-j [i—1] 


( 112 ) 


Note that \Tlj[t — 1]| = n — f. Since at most / agents may crash, agent j can receive messages 
from at least n — f agents in step 2. Thus, Algorithm 3 will always proceed to the next iteration. 
We are able to show the following theorem. 
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Theorem 5. Algorithm 4 solves Problem 2 with f3 = ^ and 'y = n — f. 

The collection of valid function is defined as follows. 

C = I p{x) : p{x) = aihi{x),\/i € V, a* > 0, Oj = 1, and 
iev i&M 



The proof of Theorem 5 is similar to the proof of Theorem 2. 

In an asynchronous system, when there are up to / Byzantine faults, simple iterative algorithms 
like Algorithm 3 may not exist, observing that it is impossible to achieve Byzantine consensus with 
single round of message exchange with only n = 3/ + 1 agents. In contrast, when the algorithm 
introduced in [1] is used as a communication mechanism in each iteration, we believe that Algorithm 
3 can be modified such that it can solve Problem 2 with /3 > 2 {\j\}\-f) 7 — l-^l “ There may 

be a tradeoff between the system size n and the communication load in each iteration. We leave 
this problem for future exploration. 

Note that the definition of admissibility of the local functions in this report is slightly different 
from that in [19]. Comparing to [19], stronger assumptions are used in proving the correctness of the 
three iterative algorithms developed in this work. In particular, we require that the local functions 
have to have L-Lipschitz derivatives. Whether such assumptions are necessary or not is still open, 
and we leave this for future exploration as well. 
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Appendices 


A Lemma 1 


Proof. Let xi,X 2 € Y such that xi / X 2 . By definition of y, there exist valid functions 


pi{x) = Ci Pihi{x)\, 



such that xi G argmin pi{x) and X2 € argmin P2{x), respectively. Note that it is possible that 
pi(') = P2{-), and that pi{-) = p{-) for f = 1 or i = 2. 

Given 0 < a < 1, let = axi + (1 — q;)x 2. We consider two cases: 

(i) Xa G argmin pi{x) U argmin P2{x) U argmin p{x), and 

(ii) Xa ^ argmin pi{x) U argmin P2ix) U argmin p{x). 

Case (i).'XQ, G argmin pi(x)Uargmin p2 (4:) Uargmin p(x). When Xq G argmin pi(x)Uargminp2(4:)U 
argmin p{x), by dehnition of Y, we have 


Xa G argmin pi(x) U argmin P 2 ix) U argmin p{x) C Y. 


Thus, Xa G y. 

Case (ii).' Xa ^ argmin pi{x) U argmin P 2 {x) U argmin p{x). By symmetry, WLOG, assume that 
xi < X2. By definition of Xa and the assumption of case (ii), it holds that xi < x^ < X2. In 
particular, it must be that 


Xa > max (argminpi(x)) and Xa < min (argminpi(x)), 


which imply that p'^(xo) > 0 andp 2 ( 3 Jo) < 0- There are two possibilities for ^(xq): either p'(xa) > 0 
or ff [xa) < 0. Note that ff (x^) 7 ^ 0, since Xq, ^ argmin p{x). 


Assume that p'{xa) < 0. Then, there exists 0 < (" < 1 such that 

C p'lixa) + (1 - c) ^{Xa) = 0. 

By dehnition of pi{x) and p(x), we have 

0 = C P'liXa) + {I - 0 ^{Xa) 



Thus, Xa is an optimum of function 



( 113 ) 
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Since pi{x) € C, it holds that Ci (|AA| + ~ Then we get 

C,Ci + (1 — I-^I + QCi ^ ai = CCi f \M\ + ^ Q!i j + (1 — 


= C1 + (1-C)1 = 1. 


In addition, since ^ < Ci < we get 




So function (113) is a valid function. 

Similarly, we can show that the above result holds when p'{xa) > 0 is positive. 
Therefore, set Y is convex. 


□ 


B Lemma 2 

Define an auxiliary function r[x) as follows 

> 0}) • (114) 

i&M i&T 


Proposition 4. Function r{x) is continuous and non-decreasing. 

Proof. Since hi{x) is convex for each i G V, it holds that h[{x) is non-decreasing. In addition, 
l{h[{x) > 0} is also non-decreasing for each i G V. Thus, function r(x) is non-decreasing. 


For each i £ V, since hi{-) is differentiable and continuous, it follows that /i((-) is continuous. 
That is, V - > 0, 3 J > 0, and for each i G V, such that 


lx — cl < (5 


Wiix) - /i'(c)| < 


e 

n 


Then 


|r(x) — r(c)| = 


^i(^) + > 0}) “ (^i(^) + (^i(c)i{^i(c) > 0}) 


i&Af 


ieT 


\ieAf 




(/i'(x) - h'i{c)) + Y {K{x)l{K{x) > 0} - /i'(c)l{/i'(c) > 0}) 


i&N 




< 


Y l^i(®) “ ^^(^)l + Y > 0} - ^i(c)l{/i((c) > 0}| 


i&N 




< 


|jV|i + |ft'(i)l{/,'(rc) > 0} - K{c)m{c) > 0)1 , 


(115) 




When l{/i'(x) > 0} = l{/i'(c) > 0}, it holds that 

\h'i{x)l{h'i{x) > 0} - /i'(c)l{/i'(c) > 0}| < max{ 0, |/i'(x) - /i'(c)|} 


<|h'(x)-h'(c)|<-. 

n 


( 116 ) 
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Consider the case when l{h[{x) > 0} 7 ^ l{/i'(c) > 0}. Assume x < c. As /i'(-) is non-decreasing, 
we have 


l{/i'(x)> 0 } = 0 /l = l{/i'(c) > 0 }. 


Then, 


\K{x)'^{K{x) > 0 } - /i'(c)l{/i'(c) > 0 }| = |0 - /i'(c)| = /i'(c) 

< h[{c) — h[{x) since h[{x) < 0 

= |h'(c)-h'(x)|<i. (117) 

Similarly, we can show |/i'(x)l{/i'(x) > 0} — /i'(c)l{h'(c) > 0}| < for the case when x > c. 

By (116) and (117), we can bound (115) as 

|r(x) - r(c)| < 1 -^ 1 ^+^^ \h[{x)l{h[{x) > 0 } - h[{c)l{h[{c) > 0 }| 
n n 

□ 


Proposition 5. For each valid function p{x) € C, argmin,^g]gp(x) is compact. 

Proof. Since argmin^.^^/ii(x) is compact, and p{x) is a convex combination of the local functions, 
it follows trivially that argmin,j,gRp(x) is bounded. Thus, to show argmin,j,g]gp(x) is compact, it 
remains to show that argmin 2 ,gRp(x) is closed. 

Let {xt}^Q C argmin 3 ,g]j|p(x) be a sequence such that 

lim xt = X*. (118) 

t—)-CX) 

Recall that /ij(-) is continuous for each i G V. Then p{x) is also continuous. Thus, (118) implies 
that 


lim p{xt) = p{x*). 

t^OO 

Therefore, x* G argmin 2 ,gRp(x) and argmin 3 ,g]Rp(x) is compact. 


(119) 

□ 


Proof of Lemma 2 

Proof. By Lemma 1, we know that Y is convex. To show Y is closed, it is enough to show that Y 
is bounded and both minT and maxT exist. 

For small enough x, h[{x) < 0 for each i G V. Thus, r(x) < 0 for small enough x. Similarly, 
r(x) > 0 for large enough x. By Proposition 4, we know that function r(x) is non-decreasing and 
continuous. Thus, there exists xq G M such that 

0 = r(xo) = ^ /i'(xo) + (/r'(xo)l{/r'(xo) > 0 }) . 

i&T 


Pi{x) = <^0 1 ^ hi{x) ^ {hi{x)l{h[{xo) > 0}) 
KieJV / 


Let 
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where Cq (|A^| + XlieJ'> *^}) = Since 


0 < 


^l{/i'(a;o) > 0} 
i&T 


<m, 


it holds that ^ < Cq < -p^. Thus, pi{x) € C is a valid function. 

Let a = min (argminpi(x)). By Proposition 5, argminpi(x) is compact. Thus, a is well-defined. 
By definition a ^Y. Next we show that a = minT. 

Suppose, on the contrary that, there exists a < a such that a & Y. Since d G Y, there exists 
q{x) € C such that d G argming(x). That is, 


0 = q'{d) = C 


\i&M i&T 


( 120 ) 


As C > 0, from (120), we have 

0 = '^h'i{d) + Y^ aX X) 

ieM i€T 

< ^ h[ (a) + ^ aXi {d) l{/i'(o) > 0} since h[ (o) l{/i'(o) > 0} > h' (a) 

ieN i€T 

< h[ (a) -|- h[ (a) l{h[{d) > 0} since 0 < < 1 

i&M ieT 

= r{d) 

< r{xo) since d < a < xq and monotonicity of r(-) 

= 0 . 

Thus, r(a) = 0 = r{xo). Since /i'(-) and l{/i'(-) > 0} are both non-decreasing for each i € V, we get 
h[{d) = h'Xo), y i G Af, and l{h[{d) > 0} = l{/i'(xo) > 0}, V i € J”. (121) 


We obtain 


PiX = c*! (X] + Y > 0}) 




i€.J~ 


= Cl ^ h'i{d) (/i'(a)l{/i'(a) > 0}) by (121) 

KieAf i&T ) 

= C'ir(a) = 0. 


That is, d G argminpi(x), contradicting the fact that d < a = min (argminpi(x)). 

Therefore, a = minT, i.e., mini" exists. Similarly, we can show that max!" also exists. 


Therefore, set Y is closed. 

□ 
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C Proof of Proposition 1 

Proof. For any t > 1, we have 
t-i 


t-i 


r=0 

= ^A[r]6*-^+ ^ \[r]b 

r=0 r-=r|l+l 

r|i 


t—r 


< 


^A[0]6‘-" + A[r-l] sinceA[i] <A[i-l], Vi>l 


t-i 


r=0 


r=r|l+l 


Thus, we get 




, 6A[rii] 


-t I 


limsupf(t) < lim A[0l-- H-^ = A[0]-- lim 62 ^ -- hm AfT 

t^cxy\ '■‘1-b 1-5 / '■^l-5t^oo 1 - 5 i-i>oo 

Equality (a) follows from the fact that 0 < 5 < 1 and the fact that limt_,.oo A[[|]] = 0. 
hand, by definition of £{t) we know i{t) > 0 for each t > 1. Thus liminft^oo f(t) > 0. 

Therefore, the limit of i{t) exists and limf_).oo f(t) = 0. 


D Proof of Lemma 12 

Proof. When t = 0, for all z, j € A/" we have 

|xi[0] — Xj[0]| < maxxj[0] — minxj[0] = U — u. 

ieJV jeAf 

Recall (94). For t > 1, 

i-l 

x[t] = — 1,0)x[0] — ^ A[r]$(t — 1, r + l)g[r 

Then each Xi[t] can be written as 


r=0 


) t—1 / n—(f) 

^ ^ik{t - l,0)xfc[0] - ^ I A[r] ^ ^ik{t - l,r + l)gk[r 


k=l 


r=0 \ k=l 


'=^0 + 0 = 0. 

On the other 


□ 
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Thus 

\Xi[t] - Xj[t] \ = 


t—1 / n—d 


^ ^ik{t - l,0)xfc[0] - ^ I X[r] ^ ^ik{t - l,r + l)gk[r] 
k=l r=0 V k=l 

n—(f> t—1 / n—cf) \ 

^ ^jkit - l,0)xfc[0] + ^ A[r] ^ ^jk{t - l,r + l)gk[r] ] 


< 


k=l 

n—d) 


r=0 \ k=l 
n—d) 




^ik{t - l,0)xfe[0] - ^ ^jk{t - l,0)xfc[0] 

k=l k=l 

t—1 / n—(p \ t—1 / n—(l} N 

+ ^ ^ ^jk{t - 'i-,r+ l)gk[r] ) - X] ( ^ ^ik{t - l,r + l)gk[r] 


r=0 \ k=l / ?’=0 \ k=l 

We bound the two terms in (122) separately. For the first term in (122), we have 


( 122 ) 


^ik{t - 1,0)XA:[0] - ^jk{t - l,0)xfc[0] 


k=l 


k=l 


yy {^ik{t -1,0) - ^jk{t - 1,0)) xk[o] 

k=l 
n—(j) 

< \^ik{t - 1,0) - ^jk(t - 1,0)1 |xfc[0]| 

k=l 

n—(j) 

< yy 7^^^ |a^A:[0]| by Theorem 3 


t—1 / n—4> \ t—1 / n—0 

Y1 -'i-,r + l)gk[r] ] -JZi Y1 ^ 


ikit - l,r + l)gk[r] 


r=0 \ k=l 
t—1 / n—d 


r=0 \ k=l 


k=l 

<{n — 4>) max{|tt|, 

In addition, the second term in (122) can be bounded as follows. 

t—1 / n—0 \ t—1 / n—0 

= X] (■^Hy]]^ifc(^-1,^ + 1) -^ikd-^,r + l) I gk[r\ 
r=0 V A:=l / 

t—1 / n—0 \ 

- ^ yz - l,r + 1) - ^ik(t - l,r + 1)1 I \gk[r]\ 

r=0 V k=l J 

t-1 

< L yy A[r](n — ^ ^ by Theorem 3 and the fact that l^fcHI < L 

r=0 

From (123) and (124), the LHS of (122) can be upper bounded by 

t-i 

\xi[t] — Xj[t]\ < {n — 4>) max{|?x|, |C/|}7^^'' + L E \[r]{n — ^ 1. 

r=H 

The proof is complete. 


(123) 




(124) 


□ 


















51 


E Proof of Corollary 3 

Proof. By Lemma 12, for each t > 1, 

\xi[t] — Xj[f]| < {n — 4>) max{|tt|, + L E \[r]{n — ^ ^ 

r=0 

t-1 

< {n — 4>) max{|tt|, |t/|}7“ + L A[r](n — <fi)'y s , 


t-i 


r=0 


and for all i,j € Af. Taking limit sup on both sides, we get 

limsup |xj[t] — < (n — 0) max{|ti|, |[/|} limsup 7 ^ + L(n — 0) limsup (E A[r]7‘ 

i ' -- t^OO i \ 

= 0 + L{n — 4>) lim sup (e A[r]7* - 


t^OO 


t^OO 


't-l 




= 0 + 0 = 0 by Lemma 1, 


proving the corollary. 


F Proof of Lemma 13 

Proof. By Lemma 12, for t > 1 we have 

M[f\ — m[f\ < {n — (f) max{|u|, + L E X[r]{n — 4))^^ <' 

Thus, we get 

oo oo / 

^A[t](M[t] -m[t]) < ^A[t] ( {n-(j))max{\u\,\U\}j^-^ + Kr]{n - 


t-i 


r=0 


t-1 


t=l 


t=l 


r=0 


oo t—1 


□ 


= (n-(/))max{|u|,|[/|}^A[t]7'^-^ + L(n - 0 ) ^ A[t] ^ A[r]7r ^ ’"I. 

t=l t=l r=0 

(125) 

Since A[t] < A[0] for each t > 0, we have 

OO OO 

(n — (f) max{|u|, |[/|} ^A[t]7r^T < (n-(/.)max{|u|,|[/|}A[0]^7r^T 


t=i 


t=i 

oo 


< (n — 4>) max{|u|, |t/|}A[0] ^7" 


t=i 


< {n — 4>) max{|u|, |P|}A[0]- 


1 — 71 


< 00. 


(126) 
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CXD t —1 


OO t—1 


L{n - ^ A[t] X[rY " = L{n- ^)YY 


t=l r=0 


t=l r=0 
OO t—1 


< 


t=l r=0 


r^i 


since A[t]A[r] < 


\^[t] + X^[r] 


,7 


[*-1-^] _|_ L{n — 4>) 


OO t—1 


J^A2[r]7r- 


t=l r=0 

The first term on the RHS of (127) can be bounded as 

t-i 


t=l r=0 


L{n — 4>) 




t=l r=0 


< 


L{n — (j)) 


L{n — 4>) 


t-i 




7 


t=i 

OO 


r=0 




t=l 


1 — 71 


L{n — (j)) 


2 1-7^ 


E ^^1*1 


=1 


< OO since E A^[t] < OO. 

t=i 

For the second term on the RHS of (127), for any fixed T, we get 


L(n — 4>) 


T t-i 


^ J^A2[r]7r 


t— 1 — 7^1 ^ Lin — (()) 


T t-i 


t=l r=0 


2 


r=0 

T-1 


t=0 


Thus, we get 


L(n — (f)) 


OO t—1 


^ J^A"[r]7r 


2 ( 1 - 71 ^) ^ 
L{n — (j)) 


< 


Ea'^ 


r < CX3. 


t=0 r=0 

By (126), (128) and (129), we get 


2 ( 1 - 71 ^)^ 

OO 

A[t] {M[t] — m[t]) < OO. 


t=i 


(127) 


(128) 


(129) 


Y^ A[t] {M[t] — m[t]) = A[0] (M[0] — m[0]) + Y^ A[t] {M[t] — m[t]) 

t=o t=i 

OO 

= A[0](t/ — tt) + Y^ A[t] {M[t] — m[t]) < OO, 

t=i 


In addition 
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proving the lemma. 

□ 


G Proof of Lemma 11 


Define an auxiliary function r(x) as follows. For each x G M, let , ■ ■ ■ ^ 

non-decreasing order of h'-{x), for j G M . Define r(x) as follows, 


r{x) 







m\-f) 


W-f 

i=2 


(130) 


Intuitively speaking, r(x) is the largest gradient value among all valid functions in C at point x. 


Proposition 6. Function r(-) is continuous and non-decreasing. 


Proof. Let x < y G M. 


r{y) — r(x) = ( 1 — 


> 1 - 


= 1 - 




m\ - 

f)) 

|AA|- 

/-I 

m\ 

-/) 

lA^I-/ 


2(|A^| - 

f)) 

|AA|- 

/-I 

m\ 

-/) 

lA/"!-/ 


2m - 

f)) 


W-f 


-1 \ 1 ' ' f 

jy) ^n{y)(y)+ 2{\J\f\-f) 5 


i=2 

W-f 

■> > j=2 

\w-f 


i=2 

\W-f 

E W><- 


2(|V|-/) ‘‘ 


1 


X ) 

\W-f 

E 

i=2 


< 0 -|- 0 since x < y and /i((-) is non-decreasing 


Thus, function r(-) is non-decreasing. 

Next we show that function r(-) is continuous. 

For each i G V, since Lj(-) is differentiable, it follows that /i'(-) is continuous. That is, Ve > 
0, 3 5 > 0, and for each i G V, such that 


X — cl < 5 


K{x) - K{c)\ < e. 
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Assume c < x < c + 6. Then 
\r{x) — r{c)\ = r{x) — r(c) by monotonicity of r(-) 

= ii- 


- 1 - 


< 1 - 


- 1 - 


< 1 - 


< 1 - 


m-f 

-1^ 

m\ - 

f)) 

\M\- 

/-I 

m\ 

-/) 

lA^I-/ 

-i") 

2m - 

f)) 

\Af\- 

/-I 

m\ 

-/) 

lA^I-/ 


2m\ - 

f)) 

lA^I-/ 


2(|A/'| - 

f)) 


1 


W\-f 

i=2 

W\-f 

’ J=2 

w\-f 


2(|AA|-/) § 


W\-f 


j=2 




W-f 

E 

i=2 


e + 


W-f 

E« = - 

i=2 


(131) 


Similarly, we can show that when c — 5 < x < c, \r{x) — r(c)| < e. 


Thus, function r(-) is continuous. 


The proof is complete. 

□ 


Proof of Lemma 11 

Proof. By Lemma 10, we know that Y is convex. To show Y is closed, it is enough to show that Y 
is bounded and both min Y and max Y exist. 

By Proposition 6, we know that function r(x) is non-decreasing and continuous. Thus, there 
exists xo G M such that 

_ / ITVI — J — IX 1 1-^1“/ 

0 = r(x„) = (^1 - j + 2 (ivp: 7 ) E 

Let 

/ 1 \Y\-f 

=y- 2{\M\-f) ) + 2{\Af\-f) ^ ^b(-o)(^)- (132) 

By construction, q{x) G C is a valid function. Note that due to the possibility of existence of ties 
in top lA/"! — / rankings of the order • • • , 1°’' ^ given xq, there may be multiple 

orders over /i'(xo),Vi G A/" of the top \M\ — f elements. Let O be the collection of all such orders. 
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Note that there is an one-to-one correspondence of an order and a valid function defined in (132). 
Let 

a = minmin (argmingo(x)), 

oGO 

which is well-defined since argmingo(x) is compact, and \0\ is finite. 

By definition a £Y. Next we show that a = min Y. 

Suppose, on the contrary that, there exists a < a such that a ^Y. Since d G Y, there exists 
^ ^ such that d € argming(x). That is, 

q'{d) = 0. (133) 


We have 


0 = q'{d) = '^aih[{t 
i€jV 


< 1 - 




2(|AA| - 

= ^{d) < r{xo) = 0 by monotonicity of r(-) 
Thus, r{d) = 0 = r{xo). In addition, we have 


JYJ + 2{\M\ - /) ^ 




1 


W\-f 


2(|AA|-/) 5 

3—^ 


< 1 - 


< 1 - 


m\ - 

\M\-f-l 


JY^j ^ii(a)(^o) + by monotonicity of /i'(-) 

^=2 

^ W-f 

2(|AA| - /) J ^L(xo)(^o) + 2('|_/y/| _ f'j ^ \(xo)(®o), 


W-f 


i=2 


which implies that fi(a),--' ,i^j^\_f{d) is an order in O. Thus, it can be seen that d > a = 
minogo min (argmingo(x)), contradicting the assumption that d < a. 

Therefore, a = minT, i.e., min Y exists. Similarly, we can show that max Y also exists. 

Therefore, set Y is closed. 

□ 










